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Abstract 

This  is  an  invited  review  of  bootstrap  methods.  It  begins  with  an  exposition  of  the  boot¬ 
strap  estimate  of  standard  error  for  one-sample  situations.  Several  examples,  some  involving  quite 
complicated  statistical  procedures,  are  given.  The  bootstrap  is  then  extended  to  other  measures 
of  statistical  accuracy,  like  bias  and  prediction  error,  and  to  complicated  data  structures  such  as 
time  series,  censored  data,  and  regression  models.  Several  more  examples  are  presented  illustrating 
these  ideas.  The  last  third  of  the  paper  deals  mainly  with  bootstrap  confidence  intervals.  The 
paper  ends  with  a  FORTRAN  program  for  bootstrap  st2mdard  errors. 

This  work  was  supported  by  an  Office  of  Naval  Research  contract  N00014-83-K-0472  and  Public 
Health  Service  Grant  5  ROl  GM21215-10. 


The  Bootstrap  Method  for  Assessing  Statistical  Aecnracy 

B.  Efron  and  R.  Tibshirani 
Stanford  University 


1.  Introduction. 

A  typical  problem  in  applied  statistics  is  the  estimation  of  an  unknown  parameter  I. 
The  two  main  questions  asked  are  (1)  what  estimator  0  should  be  used?  And  (2)  bavins 
chosen  to  use  a  particular  0,  how  accurate  is  it  as  an  estimator  of  01  The  bootstrap  is  a 
general  methodology  for  answering  the  second  question.  It  is  a  computer-based  method,  which 
substitutes  considerable  amounts  of  computation  in  place  of  theoretical  analysis.  As  we  shall 
see,  the  bootstrap  can  routinely  answer  questions  which  are  far  too  complicated  for  traditional 
statistical  analysis.  Even  for  relatively  simple  problems  computer-intensive  methods  like  the 
bootstrap  are  an  increasingly  good  data-analytic  bargain  in  an  era  of  exponentially  declining 
computational  costs. 

This  paper  describes  the  basis  of  the  bootstrap  theory,  which  is  very  simple,  gives  several 
examples  of  its  use,  and  ends  with  a  bootstrap  computer  program,  also  very  simple.  Related 
ideas  like  the  jackknife,  the  delta  method,  and  Fisher’s  information  bound  are  also  discussed. 
Most  of  the  proofs  and  technical  details  are  omitted.  These  can  be  found  in  the  references 
given,  particularly  Efron  (1982).  Some  of  the  discussion  here  is  abridged  from  Efron  and  Gong 
(1983),  and  also  from  Efron  (1984b). 

Before  beginning  the  main  exposition,  we  will  describe  how  the  bootstrap  works  in  terms 
of  a  problem  where  it  is  not  needed,  assessing  the  accuracy  of  the  sample  mean.  Suppose  that 

our  data  consists  of  a  random  sample  from  an  unknown  probability  distribution  F  on  the  real 
line, 

,Xn F. 

Havmg  observed  =  Zj,  =  xj,  •  •  • ,  we  compute  the  sample  mean  i  x„/u, 

and  wonder  how  accurate  it  is  as  an  estimate  of  the  true  mean  0  = 


i 


If  the  second  central  moment  of  F  is  ti2[F)  =  EpX^  —  {ErX)^,  then  the  standard  error 
(t{F\  n.  i),  that  is  the  standard  deviation  of  2  for  a  sample  of  size  n  from  distribution  F,  is 

<’(F)  =  |M!(f  )/»]*'’■  (1-2) 

(The  shortened  notation  <r(F)  =  o{F\  n,  2)  is  allowable  because  the  sample  size  n  and  statistic 
of  interest  2  are  known,  only  F  being  unknown.)  This  is  the  traditional  measure  of  2’s  accuracy. 
Unfortunately  we  can’t  actually  \ise  (1.2)  to  assess  the  accuracy  of  2,  since  we  don’t  know  ^^(F), 
but  we  can  use  the  estimated  standard  error 

9  =  (1.3) 

where  /ij  =  ~  ~  1)>  unbiased  estimate  of  /*2(F). 

There  is  a  more  obvious  way  to  estimate  <t(F).  Let  F  indicate  the  empirical  probability 
distribution, 

F  :  probability  mass  1/n  on  xi,X2,--- ,Xn.  (1.4) 

Then  we  can  simply  replace  F  by  F  in  (1.2),  obtaining 

a  =  a(F)  =  (M2(F)/n]»/2,  (15^ 

as  the  estimated  standard  error  for  £.  This  is  the  bootstra|>  estimate.  The  reason  for  the 
name  “bootstrap”  will  be  apparent  in  Section  2,  when  we  eyaluate  <r(F)  for  statistics  more 
complicated  than  z.  Since 

«.«(?)  =  2  (1.6) 

^1 

(7  is  not  quite  the  same  as  9,  but  the  difference  is  too  small  to  be  important  in  most  applications. 

Of  course  we  don’t  really  need  an  alternative  formula  to  (1.3)  in  this  case.  The  trouble 
begins  when  we  want  a  standard  error  for  estimators  more  complicated  than  2,  for  example  a 
median  or  a  correlation  or  a  slope  coefficient  from  a  robust  xegression.  in  most  cases  there  is 
no  equivalent  to  formula  (1.2),  which  expresses  the  standard  error  <r{F)  as  a  simple  function 
of  the  sampling  distribution  F.  As  a  result,  formulas  like  (1.3)  do  not  exist  for  most  statistics. 

This  is  where  the  computer  comes  in.  It  turns  out  that  we  can  always  numerically  evaluate 
the  bootstrap  estimate  a  =  (r{F),  even  without  knowing  a  simple  expression  for  o’(F).  The 


s> 

eraluation  of  (7  is  a  straightforward  Monte  Carlo  exercise,  described  in  the  next  section. 

Standard  errors  are  crude  but  useful  measures  of  statistical  acc\irac7.  They  are  frequently 
used  to  give  approximate  confidence  intervals  for  an  unknown  parameter  S, 

9  E  $  (1.7) 

where  is  the  100  ■  a  percentile  point  of  a  standard  normal  variate,  e.g.  =  1.645. 

Interval  (1.7)  is  sometimes  good,  and  sometimes  not  so  good.  Sections  7  and  8  discuss  a  more 
sophisticated  use  of  the  bootstrap,  which  gives  better  approximate  confidence  intervals  than 
(1.7). 

The  standard  interval  (1.7)  is  based  on  taking  literally  the  large-sample  normal  approx¬ 
imation  [9  —  9)1  b  ~  N[0,1).  Applied  statisticians  use  a  variety  of  tricks  to  improve  this 
approximation.  For  instance  if  is  the  correlation  coefficient,  and  9  the  sample  correlation, 
then  the  transformation  ^  =  tanh  ^{9),  ^  =  tanh~'(^)  greatly  improves  the  normal  approxi¬ 
mation,  at  least  in  those  cases  where  the  underlying  sampling  distribution  is  bivariate  normal 
The  correct  tactic  then  is  to  transform,  compute  the  interval  (1.7)  for  4,  and  transform  this 
interval  back  to  the  9  scale. 

We  will  see  that  bootstrap  confidence  intervals  can  automatically  incorporate  tricks  like 
this,  without  requiring  the  data  analyst  to  produce  special  techniques,  like  the  tanh“‘  trans¬ 
formation,  for  each  new  situation.  An  important  theme  of  what  follows  is  the  substitutica 
of  raw  computing  power  for  theoretical  analysis.  This  is  not  an  argument  against  theory,  of 
course,  only  against  unnecessary  theory.  Most  common  statistical  methods  were  developed  in 
the  1920’s  and  1930’s,  when  computation  was  slow  and  expensive.  Now  that  computation  is 
fast  and  cheap  we  can  hope  for  and  expect  changes  in  statistical  methodology.  This  paper 
discusses  one  such  potential  change,  Efron  (1979b)  discusses  several  others. 

2.  The  Bootstrap  Estimate  of  Standard  Error. 

This  section  presents  a  more  careful  description  of  the  bootstrap  estimate  of  standard 
error.  For  now  we  will  assume  that  the  observed  data  f  =  (zi, Xj, •  •  • , x„)  consists  of  mde- 
pendent  and  identically  distributed  (i.i.d.)  observations  Xi,  JVj,  •  •  • ,  F*,  as  in  (1.1).  Here 
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F  represents  an  unknown  probability  distribution  on  X ,  the  common  sample  space  of  the  ob> 
serrations.  We  hare  a  statistic  of  interest,  say  9{f),  to  which  we  wish  to  assign  an  estimated 
standEird  error. 

Figure  1  shows  an  example.  The  sample  space  X  is  the  positive  quadrant  of  the 
plane.  We  have  observed  n  =  15  bivariate  data  points,  each  corresponding  to  an  American 
law  school.  Each  point  x,-  consists  of  two  summary  statistics  for  the  1973  entering  class  at  law 
school  t, 

x.  =  (LSAT,-,GPA,);  (2.1) 

LSAT,-  is  the  class’  average  score  on  a  nationwide  exam  called  "LSAT”;  GPA«  is  the  class’ 
average  undergraduate  grades.  The  observed  Pearson  correlation  coefficient  for  these  15  points 
is  ^  =  .776.  We  wish  to  assign  a  standard  error  to  this  estimate. 

Let  <t{F)  indicate  the  standard  error  of  as  a  function  of  the  unknown  sampling  distri¬ 
bution  F, 

aiF)  =  [Varr{^(r)}l»/^  (2-2) 

Of  course  ff{F)  is  also  a  function  of  the  sample  size  n  and  the  form  of  the  statistic  6{f), 
but  since  both  of  these  are  known  they  needn’t  be  indicated  in  the  notation.  The  bootstrap 
estimate  of  standard  error  is 

&  =  (2.3) 

where  F  is  the  empirical  distribution  (1.4),  putting  probability  1/n  on  each  observed  data 
point  X{.  In  the  law  school  example,  F  is  the  distribution  putting  mass  1/15  on  each  point 
in  Figure  1,  and  &  is  the  standard  deviation  of  the  correlation  coefficient  for  15  i.i.d.  points 
drawn  from  F. 

In  most  cases,  including  that  of  the  correlation  coefficient,  there  is  no  simple  expression  for 
the  function  <t{F)  in  (2.2).  Nevertheless  it  is  easy  to  numerically  evaluate  &  =  <’^(-^)  ky  means 
of  a  Monte  Carlo  algorithm  which  depends  on  the  following  notation:  ff*  = 
indicates  n  independent  draws  from  F,  called  a  bootstrap  sample.  Because  F  is  the  empirical 
distribution  of  the  data,  a  bootstrap  sample  turns  out  to  be  the  same  as  a  random  sample  of 
size  n  drawn  with  replacement  from  the  actual  sample  {xi,  X2,  •  •  •  ,x„}. 

The  Monte  Carlo  algorithm  proceeds  in  three  steps. 
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Figure  1.  The  lav  school  data  (Efron  1979b).  The  data  points, 
beginning  with  School  No.  1,  are  (576,  3.39),  (635,  3.30),  (558,  2.81), 
(578,  3.03),  (666,  3.44),  (580,  3.07),  (555,  3.00),  (661,  3.43),  (651, 
3.36),  (605,  3.13),  (653,  3.12),  (575,  2.74),  (545,  2.76),  (572,  2.88), 
(594,  2.96). 


(i)  Using  a  random  number  generator,  independently  draw  a  large  number  of  bootstrap 
samples,  say  f*(l),  f  ♦  (2),- ••,p*(B). 

(ii)  For  each  bootstrap  sample  f(6),  evaluate  the  statistic  of  interest,  say  =  9{t*{b)), 
b=l,2,-,B. 

(iii)  Calculate  the  sample  standard  deviation  of  the  9*(b)  values, 


<rfl  = 


B-1 


' '  B 


(2.4) 


It  is  easy  to  see  that  as  B  -►  oo,  will  i^proach  &  =  (r{F),  the  bootstrap  estimate  of 
standard  error.  All  we  are  doing  is  evaluating  a  standard  deviation  by  Monte  Carlo  sampling. 
Later,  in  Section  9,  we  will  discuss  how  large  B  need  be  taken.  For  most  situations  B  in  the 
range  50  to  200  is  quite  adequate.  In  what  follows  we  wiU  usually  ignore  the  difference  between 
&B  Mid  ff,  calling  both  simply 


Figure  2  shows  the  histogram  of  B  =  1000  bootstrap  replications  of  the  correlation 
coefficient,  from  the  law  school  data.  For  convenient  reference  the  abscissa  is  plotted  m  terms 
oie*  -  6  =  6*  -  .776.  Formula  (2.4)  gives  a  =  .127  as  the  bootstrap  estimate  of  standard  error. 
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This  can  be  compared  with  the  usual  normal  theory  estimate  of  standard  error  for  S, 

1-6^ 

‘^NORM  =  (2-5) 

Johnson  and  Kots  (1970),  p.  229. 


Figure  2.  Histogram  oi  B  —  1000  bootstrap  replications  of  9*  for 
the  law  school  data.  The  normal  theory  density  curve  has  a  similar 
shape,  bat  falls  off  more  qnickly  at  the  upper  taiL 

There  is  another  way  to  describe  the  bootstrap  standard  error:  F  is  the  nonparametric 
maximum  likelihood  estimate  (MLE)  of  the  unknown  distribution  F,  Kiefer  and  Wolfowits 
(1956).  This  means  that  the  bootstrap  estimate  &  —  <r(F)  is  the  nonparametric  MLE  of  <t(F), 
the  true  standard  error. 

In  fact  there  is  nothing  which  says  that  the  bootstrap  must  be  carried  out  nonparamet- 
rically.  Suppose  for  instance  that  in  the  law  school  example  we  believed  the  true  sampling 
distribution  F  mrist  be  bivariate  normal.  Then  we  could  estimate  F  with  its  parametrie  MLE 
^NORM’  bivariate  normal  distribution  having  the  same  mean  vector  and  covariance  ma¬ 
trix  as  the  data.  The  bootstrap  samples  at  step  (i)  of  the  algorithm  could  then  be  drawn  from 
FjijORM  of  F,  and  steps  (ii)  and  (iii)  carried  out  as  before. 

The  smooth  curve  in  Figure  2  shows  the  results  of  carrying  out  tbia  “normal  theory 
bootstrap”  on  the  law  school  data.  Actually  there  is  no  need  to  do  the  bootstrap  sampling  in 
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this  case,  becatise  of  Fisher's  formula  for  the  sampling  density  of  a  correlation  coefficient  in 
the  bivariate  normal  situation,  see  Chapter  32  of  Johnson  and  Kots  (1970).  This  density  is  a 

close  approximation  to  f^NORM  ~  '^(^N0RM)>  parametric  bootstrap  estimate  of  standard 
error. 

In  considering  the  merits  or  demerits  of  the  bootstrap,  it  is  worth  remembering  that 
all  of  the  usual  formulas  for  estimating  standard  errors,  like  one  over  the  square  root  of  the 
observed  Fisher  information,  are  essentially  bootstrap  estimates  carried  out  in  a  parametric 
framework.  This  point  is  carefully  explained  in  Section  5  of  Efron  (1981b).  The  straightforward 
nonparametric  algorithm  (i)-(iii)  has  the  virtues  of  avoiding  all  parametric  assumptions,  all 
approximations  (such  as  those  involved  with  the  Fisher  information  expression  for  the  standard 
error  of  an  MLE),  and  in  fact  all  analytic  difficulties  of  any  kind.  The  data  analyst  is  free  to 
obtain  standard  errors  for  enormously  complicated  estimators,  subject  only  to  the  constraints 
of  computer  time  Sections  3  and  6  discuss  some  interesting  applied  problems  which  are  far  too 
complicated  for  standard  analyses. 

How  well  does  the  bootstrap  work?  Table  1  shows  the  answer  in  one  situation.  Here  X 
IS  the  real  line,  n  =  15,  and  the  statistic  $  of  interest  is  the  25%  trimmed  mean.  If  the  true 
sampling  distribution  F  is  N{0, 1),  then  the  true  standard  error  is  <t(F’)  =  .286.  The  bootstrap 
estimate  a  is  nearly  unbiased,  averaging  .287  in  a  large  sampling  experiment.  The  standard 
deviation  of  the  bootstrap  estimate  cr  is  itself  .071  in  this  case,  with  coefficient  of  variation 
.071/.287  =  .25.  [Notice  that  there  are  two  levels  of  Monte  Carlo  involved  in  Table  1:  first 
drawing  the  actual  samples  f  =  {xi,  12,  •  •  • ,  ^is)  from  F,  and  then  drawing  bootstrap  samples 
zjr  •  •  • ,  iJs)  with  f  held  fixed.  The  bootstrap  samples  evaluate  a  for  a  fixed  value  of 
The  standard  deviation  .071  refers  to  the  variability  of  &  due  to  the  random  choice  of  f .] 

The  jackknife  is  another  common  method  of  assigning  nonparametric  standard  errors, 
discTissed  in  Section  10.  The  jackknife  estimate  &j  is  also  nearly  unbiased  for  ff(F’),  but 
has  higher  coefficient  of  variation  [GV).  The  minimum  possible  CV  for  a  scale-invariant 
estimate  of  <r(F),  assuming  full  knowledge  of  the  parametric  model,  is  shown  in  brackets.  The 
nonparametric  bootstrap  is  seen  to  be  moderately  efficient  in  both  cases  considered  in  Table 
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Sd 

Coetf 

var 

3d 

C09ff 

yor 

Bootstrap  a  ; 
(8«200)  1 

2S7 

.071 

2S 

242 

.078 

22 

Jackkntfaff/ 

200 

.064 

20 

224 

.065 

20 

Ihia: 

PMnimwn  C.V.] 

200 

222 

127\ 

Table  1.  A  sampling  experiment  comparing  the  bootstrap  and  jackknife  estimates  of  standard 
error  for  the  25%  trimmed  mean,  sample  sixe  n  =  15. 

Table  2  returns  to  the  case  of  $  the  correlation  coefficient.  Instead  of  real  data  we  have  a 
sampling  experiment  in  which  the  true  F  is  bivariate  normal,  true  correlation  $  =  .50,  sample 
size  n  =  14.  Table  2  is  abstracted  from  a  larger  table  in  Efron  (1981c),  in  which  some  of  the 
methods  for  estimating  a  standard  error  required  the  sample  size  to  be  even. 

The  left  side  of  Table  2  refers  to  $,  while  the  right  side  refers  to  ^  =  tanh~^(^)  = 
.51og(l  +  ^)/(l  ~  ^)-  For  each  estimator  of  standard  error,  the  root  mean  squued  error  of 
estimation  [E((7  —  is  given  in  the  column  headed  >/MSE. 

The  bootstrap  was  run  with  B  =  128  and  also  with  B  =  612,  the  latter  value  yielding 
only  slightly  better  estimates  in  accordance  with  the  results  of  Section  9.  Further  increasing 
B  would  be  pointless.  It  can  be  shown  that  B  =  oo  gives  V^iSE  =  .063  for  $,  only  .001  less 
than  B  =  152.  The  normal  theory  estimate  (2.5),  which  we  know  to  be  ideal  for  this  sampling 
experiment,  has  v^SE  =  .056. 

We  can  compromise  between  the  totally  nonparametric  bootstrap  estimate  a  and  the 
totally  parametric  bootstrap  estimate  ^  ^  ^  Table  2.  Let 

|;=  ^"_j(*,-  —  j)(z, •  —  *)'/«  be  the  sample  covariance  matrix  of  the  observed  data.  The  normal 
smoothed  bootstrap  draws  the  bootstrap  sample  from  j’9iy3(0,  .25^),  9  indicating  ctmvohiiion. 
This  amounts  to  estimating  F  by  an  equal  mixture  of  the  n  distributions  ^3(2,*,  .25$),  that 
is  by  a  normal  window  estimate.  Each  point  in  a  smoothed  bootstrap  sample  is  the  sum 
of  a  randomly  selected  original  data  point  zy,  plus  an  independent  bivariate  normal  point 
Zj  ~  ^2(0)  25$  ).  Smoothing  makes  little  difference  on  the  left  side  of  the  table,  but  is 


9 


spectacularly  effective  in  the  ^  case.  The  latter  result  is  suspect  since  the  true  sampling 
distribution  is  bivariate  normal,  and  the  function  ^  =  tanh~‘#  is  specifically  chosen  to  have 
nearly  constant  standard  error  in  the  bivanate*nonnal  family.  The  uniform  smoothed  bootstrap 
samples  from  F9U{0,  -25^),  where  1/(0,  .2S^)  is  the  uniform  distribution  on  a  rhombus  selected 
so  1/  has  mean  vector  0  and  covariance  matrix  .25:(l  It  yields  moderate  reductions  m  \/M^ 
for  both  tides  of  the  table. 


Ave 


1.  Bootstrap  B  =  128  .206 

2.  Bootstrap  6  =  512  .206 

3.  Normal  Smoothed  Bootstrap  B  1 28  .200 

4.  Uniform  Smoothed  Bootstrap  B  -  128  .205 

5.  Uniform  Smoothed  Bootstrap  B  =  512  .205 

6.  Jackknife  .223 

7.  Detta  Method  .175 

(Infimtestmal  Jackknife) 

8.  Normal  Theory  ^17 

True  Standard  Error  ^18 


Summary  Slaatact  for  200  fnafs 


SlWK^ard  Bfror 
Estimates  for  Q 

Standard  Error 
Estimates  for  6 

Std  Dev 

CV 

\'MSE 

Ave 

Std  Dev 

CV 

%  MSE 

.066 

32 

.067 

.301 

065 

.22 

.065 

.063 

.31 

.064 

.301 

.062 

21 

.062 

.060 

.30 

.063 

.296 

.041 

.14 

.041 

.061 

.30 

.062 

.298 

.058 

19 

.058 

.059 

.29 

.060 

.296 

.052 

.18 

.052 

.085 

.38 

.085 

.314 

.090 

.29 

.091 

.058 

.33 

.072 

.244 

.052 

.21 

.076 

.056 

26 

.056 

202 

0 

0 

.003 

290 

Table  3.  Estimates  of  standard  error  for  the  correlation  coefficient  $  and  for  ^  =  tanh~^r, 
sample  sise  n  =  14,  distribution  F  bivariate  normal  with  true  correlation  p  =  .5.  FVom  a  larger 
table  in  Efron  (1981c). 


Une  6  of  Table  2  refers  to  the  delta  method^  which  is  the  most  common  method  of  assigning 
nonparametric  standard  error.  Surprisin^y  enough,  it  is  badly  biased  downwards  on  both  sides 
of  the  table.  The  delta  method,  also  known  as  the  method  of  statistical  differentials,  the  Thylor 
series  method,  and  the  infinitesimal  jackknife,  are  discussed  in  Section  10. 


S.  Examples. 


Example  1:  Cooc’s  proportional  hasards  model 


In  this  section  we  apply  bootstrap  standard  error  estimation  to  some  complicated  statis¬ 
tics. 


The  data  for  this  example  come  from  a  study  of  leukemia  remission  times  m  mice,  taken 
from  Cox  (1972).  They  consist  of  measurements  of  remission  time  (y)  m  weeks  for  two  groups. 
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treatment  [x  =  0)  and  control  (i  =  1),  and  a  0-1  variable  (^,)  indicating  whether  or  not  the 
remission  time  is  censored  (0)  or  complete  (1),  There  are  21  mice  in  each  group. 

The  standard  regression  model  for  censored  data  is  Cox’s  proportional  hazards  model  (Cox 
1972).  It  assumes  that  the  hazard  function  k(t  |x),  the  probability  of  going  into  remission  in 
next  instant  given  no  remission  up  to  time  t  for  a  mouse  with  covariate  x,  is  of  the  form 

h{t  \x)  =  ho{t)e^‘ .  (3.1) 


Here  Ao(f)  is  an  arbitrary  unspecified  function.  Since  x  here  is  a  group  indicator,  this  means 
simply  that  the  hazard  for  the  control  group  is  times  the  hsizard  for  the  treatment  group. 
The  regression  parameter  P  is  estimated  independently  of  fio(t)  through  maximization  of  the 
so  called  ‘partial  likelihood” 


p£=n 


(3.2) 


i€D 

where  D  is  the  set  of  indices  of  the  failure  times  and  Rf  is  the  set  of  indices  of  those  at  risk  at 
time  y,-.  This  maximization  requires  an  iterative  computer  search. 


The  estimate  0  for  these  data  turns  out  to  be  1.51.  T^en  literally,  this  says  that  the 
hazard  rate  is  =  4.33  times  higher  in  the  control  group  than  in  the  treatment  group, 
so  the  treatment  is  very  effective.  What’s  the  standard  error  of  /??  The  usual  asymptotic 
maximum  likelihood  theory,  one  over  the  square  root  of  the  observed  Fisher  information,  gives 
an  estimate  of  .41.  Despite  the  complicated  nature  of  the  estimation  procedure,  we  can  also 
estimate  the  standard  error  using  the  bootstrap.  We  sample  with  replacement  from  the  triples 
{(yi> *x, ^i),  •  •  • , {va2, *42, ^42)}-  For  each  bootstrap  sample  {(yj, xj, 61), . . . ,  (y*j,  *42.  ^4*2)} 
form  the  partial  likelihood  and  numerically  maximize  it  to  produce  the  bootstrap  estimate  0*. 
A  histogram  of  1000  bootstrap  values  is  shown  in  Figure  3. 

The  bootstrap  estimate  of  the  standard  error  of  0  based  on  these  1000  numbers  is  .42. 
Although  that  the  bootstrap  and  standard  estimates  agree,  it  is  interesting  to  note  that  the 
bootstrap  distribution  is  skewed  to  the  right.  This  leads  us  to  ask;  is  there  other  information 
that  we  can  extract  from  the  bootstrap  distribution  other  than  a  standard  error  estimate? 
The  answer  is  yes —  in  particular,  the  bootstrap  distribution  can  be  used  to  form  a  confidence 
interval  lor  0,  as  we  will  see  in  Section  9.  The  shape  of  the  bootstrap  distribution  will  help 
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Figure  3.  Histogram  at  1000  bootstrap  replications  for  the  moose 
lenkemia  data 

determine  the  shape  of  the  confidence  interral. 

In  this  example  our  resampling  unit  was  the  triple  and  we  ignored  the  tmique 

elements  of  the  problem,  i.e.  the  censoring,  and  the  particular  model  being  used.  In  fact, 
there  are  other  ways  to  bootstrap  this  problem.  We’ll  see  this  when  we  discuss  bootstrapping 
censored  data  in  Section  5. 

Example  2:  Linear  and  Projection  Pursuit  Regression 

We  illustrate  an  application  of  the  bootstrap  to  standard  linear  least  squares  regression 
as  well  as  to  a  non-parametric  regression  technique. 

Consider  the  standard  regression  setup.  We  have  n  observations  on  a  response  Y  and  co¬ 
variates  (Xi,X2, . . .  Xp).  Denote  the  ith  observed  vector  of  covariates  by  x,-  =  {xn,  Zfj, . . .  x,-,)'. 
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The  UBual  linear  regression  model  assumes 

p 

E{Yi)  =  a  +  0iXij  (3.3) 

Friedman  and  Stuetsle  (1981)  introduced  a  more  general  model,  the  projection  pursuit  regres¬ 
sion  model 

era =f;  (3.4) 

y=i 

The  p-vectors  a,  are  unit  vectors  (“directions”),  and  the  functions  sy{*)  are  unspecified. 

Estimation  of  {•i,si(  )}, . . .  {a,»,s,»(  )}  is  performed  in  a  forward  stepwise  manner  as 
follows.  Consider  {«i,si(-)}.  Given  a  direction  «!,  Si{-)  is  estimated  by  a  non-parametric 
smoother  (e.g.  running  mean)  of  p  on  fli  ■  z.  The  projection  pursuit  regression  algorithm 
searches  over  all  unit  directions  to  find  the  direction  ii  and  associated  function  si(  )  that 
minimize  -  si(a  •  *,•))*.  Then  residuals  are  taken  and  the  next  direction  and  function 

are  determined.  This  process  is  continued  until  no  additional  term  significantly  reduces  the 
residual  sum  of  squares. 

Notice  the  relation  of  the  projection  pursuit  regression  model  to  the  standard  linear 
regression  model.  When  the  function  «i(')  is  forced  to  be  linear,  and  is  estimated  by  the  usual 
least  squares  method,  a  one  term  projection  pursuit  model  is  exactly  the  same  as  the  standard 
linear  regression  model.  That  is  to  say,  the  fitted  model  «i(ii  * «,-)  exactly  equals  the  least 
squares  fit  d  -h  This  is  because  the  least  squares  fit,  by  definition,  finds  the  best 

direction  and  the  best  linear  function  of  that  direction.  Note  also  that  adding  another  linear 
term  S2(^  •  *2)  would  not  change  the  fitted  model  since  the  sum  of  two  linear  functions  is 
another  linear  function. 

Hastie  and  Tibshirani  (1984)  applied  the  bootstrap  to  the  linear  and  projection  pursuit  re¬ 
gression  models  to  assess  the  variability  of  the  coefficients  in  each.  The  data  they  considered  are 
taken  from  Breiman  and  Friedman  (1984).  The  response  Y  is  Upland  atmospheric  ozone  con¬ 
centration  (ppm);  the  covariates  Xi-  Sandburg  Air  Force  base  temperature  (C*),  .Xj-  inversion 
base  height  (ft.)  ,  Xz-  Daggot  pressure  gradient  (mmhg),  X4-  visibility  (miles),  and  Xz-  day 
of  the  year.  There  are  330  observations.  The  number  of  terms  (m)  in  the  model  (3.4)  is  taken 
to  be  two.  The  projection  pursuit  algorithm  chose  directions  ai  =  (.80,  -.38,  .37,  -.24,  -.14)' 
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and  ij  =  (.07,  .16,  .04, -.05, -.98)'.  These  directions  consist  mostly  of  Sandburg  Air  Fort* 
temperature  and  day  of  the  year  respectively.  (We  don’t  show  graphs  of  the  estimated  func- 
tions  ii(  )  and  ij(  )  although  in  a  full  analysis  of  the  data  they  would  also  be  of  interest.) 
Forcing  ii(  )  to  be  linear  results  the  direction  =  (.90,  -.37,  .03,  -.14,  -.19)'.  These  are  just 
the  usual  least  squares  estimates  scaled  so  that  0]  — 

To  assess  the  variability  of  the  directions,  a  bootstap  sample  is  drawn  with  replacement 
from  (yi,xu,  ...xis),. .  .(y330i>23soi  •••^ssos)  and  the  projection  pursuit  algorithm  is  applied. 
Figures  4  and  5  show  histograms  of  the  directions  iJ  and  for  200  bootstrap  replications. 
Also  shown  in  Figure  4  (broken  histogram)  are  the  bootstrap  replications  of  Ax  with  ii(-) 
forced  to  be  linear. 

The  first  direction  of  the  projection  pursuit  model  is  quite  stable  and  only  slightly  more 
variable  than  the  corresponding  linear  regression  direction.  But  the  second  direction  is  ex¬ 
tremely  unstable!  It  is  clearly  unwise  to  put  any  faith  in  the  second  direction  of  the  original 
projection  pursuit  model. 

Uxaxnple  S:  Cox*8  Model  and  Local  Likeliliood  Eetimation 

In  this  example,  we  return  to  Cox’s  proportional  haxards  model  described  in  Example  1, 
but  with  a  few  added  twists. 

The  data  that  we’ll  discuss  come  from  the  Stanford  heart  transplant  program  and  are  given 
m  Miller  and  Halpem  (1983).  The  response  y  is  survival  time  in  weeks  after  a  heart  transplant, 
the  covariate  x  is  age  at  transplant,  and  the  0-1  variable  S  indicates  whether  the  survival  time 
is  censored  (0)  or  complete  (1).  There  are  measurements  on  157  patients.  A  proportional 
hazards  model  was  fit  to  these  data,  with  a  quadratic  term  i.e.  k{t  |x)  =  As(t)e***+^**.  Both 
01  and  02  »re  highly  significant;  the  broken  curve  in  Figure  6  u  0ix  +  ^jx*  as  a  function  of  x. 

For  comparison.  Figure  6  shows  (solid  line)  another  estimate.  This  was  computed  using 
local  likelihood  ettimation  (Tibshirani  and  Hastie  1984).  Given  a  genera!  proportional  hazards 
model  of  the  form  k(t  |x)  =  Ao(t)e*<*),  the  local  likelihood  technique  assumes  nothing  about 
the  parametric  form  of  t{x);  instead  it  estimates  s(x)  non-parametrically  using  a  kind  a  local 
averaging.  The  algorithm  is  very  computationally  intensive,  and  standard  maximum  likelihood 


•Mmetrata 


Figure  4.  Smoothed  histofnms  of  the  bootstrapped  coefiScients 
for  the  first  term  in  the  projection  pursuit  regression  model  Solid 
histograms  are  for  the  usual  projection  pursuit  model;  the  dotted 
histograms  are  for  linear  «(■). 
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Figure  5.  Smoothed  histofrans  of  the  bootstrapped  coefficients 
for  the  second  term  in  the  projection  pursuit  model 
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theory  cannot  be  applied. 

A  comparison  of  the  two  functions  reveals  an  important  qualitative  difference:  the  para¬ 
metric  estimate  suggests  that  the  hazard  decreases  sharply  up  to  age  34,  then  rises;  the  local 
likelihood  estimate  stays  ^proodmately  constant  up  to  age  45  then  rises.  Has  the  forced 
fitting  of  a  quadratic  function  produced  a  misleading  result?  To  answer  this  question,  we 
can  bootstrap  the  local  likelihood  estimate.  We  sample  with  replacement  from  the  triples 
•(ViS7.2i67<^i87)}  and  apply  the  local  likelihood  algorithm  to  each  bootstrap 
sample.  Figure  7  shows  estimated  curves  frtjm  20  bootstrap  samples.  Some  of  the  curves  are 
fiat  up  to  age  45,  others  are  decreasing.  Hence  the  original  local  likelihood  estimate  is  highly 
variable  in  this  region  and  on  the  basis  of  these  data  we  can’t  determine  the  true  behaviour 
of  the  function  there.  A  look  back  at  the  original  data  shows  that  while  half  of  the  patients 
were  under  45,  only  135(  of  the  patients  were  under  30.  Figure  7  also  shows  that  the  estimate 
is  stable  near  the  middle  ages  but  unstable  for  the  older  patients. 

4.  Other  Measures  of  Statistical  Error. 

So  far  we  have  discussed  statistical  error,  or  accuracy,  in  terms  of  the  standard  error.  It 
is  easy  to  assess  other  measures  of  statistical  error,  such  as  bias  or  prediction  error,  using  the 
bootstrap. 

Consider  the  estimation  of  bias.  For  a  given  statistic  ),  and  a  given  parameter 
let 

=  (4.1) 

(It  will  help  keep  our  notation  clear  to  call  the  parameter  of  interest  fi  rather  than  t.)  For 
example  might  be  the  mean  of  the  distribution  F,  assuming  the  sample  space  I  is  the  real 
line,  and  $  the  2S%  trimmed  mean.  The  biu  of  9  for  estimating  fi  is 

/9(F)  =  EfR{9,  F)  =  Ff  {^(r)}  -  /i(F).  (4.2) 


The  notation  Ep  indicates  expectation  with  respect  to  the  probability  mechanism  appropriate 
to  F,  in  this  case  $  =  (zi, zji •  •  •  >  ®n)  a  random  sample  frtim  F. 
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The  bootstrap  estimate  of  bias  is 

=  «/('(»•)} -Mf). 

As  m  Section  2,  f*  denotes  a  random  sample  from  F,  i.e.  a  bootstrap  sample. 

To  numerically  eyaluate  /?,  all  we  do  is  change  step  (ui)  of  the  bootstrap  algorithm  in  Section 
2  to 

=  4  E  ^(»*(*).  -  M(i’) 

*=i  "  (4.4) 

=n)-A(n 

As  B  —*  CO,  0B  goes  to  as  given  in  (4.3). 

As  an  example  consider  the  blood  serum  data  of  Table  3.  Suppose  we  wish  to  estimate 
the  true  mean  /s  =  of  this  population  using  0,  the  25%  trimmed  mean.  We  calculate 

A  =  fi-iF)  =  2.39,  the  sample  mean  of  the  54  observations,  and  $  =  2.24,  the  trimmed  mean. 
The  trimmed  mean  is  lower  because  it  discounts  the  effect  of  the  large  observations  6.4  and  9.4. 
It  looks  like  the  trimmed  mean  might  be  more  robust  for  this  type  of  data,  and  as  a  matter 
of  fact  a  bootstrap  analysis,  B  =  1000,  gave  estimated  standard  error  a  =  .16  for  $,  compared 
to  .21  for  the  sample  mean.  But  what  about  bias? 

0.1,  0.1,  0.2,  0.4,  0.4,  0.6,  0.8,  0.8,  0.9,  0.9,  1.3,  1.3, 

1.4,  1.5,  1.6,  1.6,  1.7,  1.7,  1.7,  1.8,  2.0,  2.0,  2.2,  2.2 

2.2,  2.3,  2.3,  2.4,  2.4,  2.4,  2.4,  2.4,  2.4,  2.5,  2.5,  2.5, 

2.7,  2.7,  2.8,  2.9,  2.9,  2.9,  3.0,  3.1,  3.1,  3.2,  3.2,  3.3, 

3.3,  3.5,  4.4,  4.5,  6.4,  9.4 

Table  3.  BHCG  blood  serum  levels  for  54  patients  having  metasticixed  breast  cancer,  pre- 
sented  in  ascending  order,  ’ 

The  same  1000  bootstrap  replications  which  gave  ^  =  .164  also  gave  #*(•)  =  2.29,  so 

/?  =  2.29  -  2.39  =  -0.10  (4  5) 

according  to  (4.4).  (The  estimated  standard  deviation  of  fig-P  due  to  the  limitations  of  having 
B  =  1000  bootstraps  is  only  0.005  m  this  case,  so  we  can  ignore  the  difference  between  Ps 
and  p.)  W'hether  or  not  a  bias  of  magnitude  -0.10  is  too  large  depends  on  the  context  of  the 
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problem.  If  we  attempt  to  remove  the  biaa  by  subtraction,  we  ^t  9  —  ^  ^  2.24  —  (0.10)  =  2.34, 
which  is  close  to  the  sample  mean  2.39.  Removing  bias  in  this  easy  is  frequently  a  bad  idea, 
see  Hinkley  (1978),  but  at  least  the  bootstrap  analysis  has  given  us  a  reasonable  picture  of  the 
bias  and  standard  error  of  9. 


Here  is  another  measure  of  statistical  accuracy,  different  than  either  bias  or  standvd 
error.  Let  9($)  be  the  25%  trimmed  mean  and  ii,(F)  be  the  mean  of  /*,  as  in  the  serum 
example,  and  also  let  t(f)  be  the  interquartile  range,  the  distance  between  the  25th  and  75th 
percentiles  of  the  sample  f  =  (xi,  zj,  ■  *  * ,  Xn).  Define 


»(f) 


(4.6) 


R  is  like  a  Student’s  t  statistic,  except  that  we  have  substituted  the  25%  trimmed  mean  for 
the  82unple  mean,  and  the  interquartile  range  for  the  standard  deviation. 


Suppose  we  know  the  5th  and  95th  percentiles  of  ^(9,^*),  say  and  ^^•*®J(/’), 

where  the  definition  of  is 


Probr{i2(f,F)  <  /»<“)(/•)}  =  .05, 


(4.7) 


and  similarly  for  /»^‘*®^(F).  The  relationship  Probf{p^'®®J  <  R  <  =  .90  combines  with 

definition  (4.6)  to  give  a  central  90%  ‘‘t  interval”  for  the  mean  m(^)> 


(4.8) 


Of  course  we  don’t  know  ^^•®®)(F)  and  but  we  can  approximate  them  by  their 

bootstrap  estimates  pf'^^(F)  and  p('*^)(F).  A  bootstriq)  sample  f*  gives  a  bootstrap  value  of 
(4.6),  R(f*,  F)  =  (9(f*)  -  ti{F)/i{f*),  where  t(f*)  is  the  interquartile  range  of  the  bootstrap 
data  For  any  fixed  number  p,  the  bootstr^  estimate  of  Prob^{i2  <  p}  based 

on  B  bootstrap  samples  is 

HRi9*{h),F)<9}/B.  (4.9) 

By  keeping  track  of  the  empirical  distribution  of  R{f*{h),F),  we  can  pick  off  the  values  of  p 
which  make  (4.9)  equal  .05  and  .95.  These  approach  p^’^^F)  and  ^^•’®J(/’)  as  B  — ►  00. 

For  the  serum  data,  B  =  1000  bootstrap  replications  gave  =  —.303  and  p^‘*^^{F) 
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=  .078.  Substituting  these  tsJucs  into  (4.9),  and  using  the  obserred  estimates  i  =  2.34, 
i  =  1.40,  gives 

M€  [2.13,2.66]  (4.10) 

as  a  central  90%  ‘bootstrap  t  interval*  for  the  true  mean  it{F).  This  compares  with  the 
standard  t  interval  based  on  53  degrees  of  freedom  i±  1.67ff  =  (2.04, 2.74).  Here  9  =  .21  is  the 
usual  estimate  of  standard  error  (1.3). 

It  is  interesting  to  notice  that  if  we  discard  the  54th  observation  9.4,  then  9  decreases 
to  .16,  and  the  Student’s  t  interval  S  ±  1.679  equals  [2.12,2.66]  which  is  almost  exactly  the 
same  as  (4.10)!  Bootstrap  confidence  mtervals  are  discussed  further  m  Sections  7  and  8.  They 
require  more  bootstrap  replications  than  does  &,  on  the  order  o{  B  =  1000  rather  than  B  =  50 
or  100.  This  point  is  discussed  briefly  in  Section  9. 

By  now  it  should  be  clear  that  we  can  use  any  random  variable  to  measure 

accuracy,  not  just  (4.1)  or  (4.6),  and  then  estimate  by  its  bootstrap  value 

R{9*W,F)/B.  Similarly  we  can  estimate  ErR(y,F)^  by  Ef.R{$*,F)^, 
etc.  Efron  (1983)  considers  the  prediction  problem,  in  which  a  training  set  of  data  is  used  to 
construct  a  prediction  rule.  A  naive  estimate  of  the  prediction  rule’s  accuracy  is  the  proportion 
of  correct  guesses  it  makes  on  its  own  training  set,  but  this  can  be  greatly  overoptimistic  since 
the  prediction  rule  is  explicitly  constructed  to  minimize  errors  on  the  training  set.  In  this  case, 
a  natural  choice  of  i?(f ,  F)  is  the  overoptimism,  the  difference  between  the  naive  estimate  and 
the  actual  success  rate  of  the  prediction  rule  for  new  data.  Efrtm  (1983)  gives  the  bootstrap 
estimate  of  overoptimism,  and  shows  that  it  is  closely  related  to  cross-validation,  the  usual 
method  of  estimating  overoptimism.  The  paper  goes  on  to  show  that  some  modifications 
the  bootstrap  estimate  greatly  outperform  both  cross-validation  and  the  bootstnq>. 

5.  More  Complicated  Data  Sets. 

The  bootstrap  u  not  restricted  to  situations  where  the  data  is  a  simple  random  sample 
from  a  single  distribution.  Suppose  for  instance  that  the  data  consists  of  two  independent 
random  samples. 


Ui,U2,- "  ,Um  ^  F  and  ~  G, 


(5.1) 


30 


Summary  Statiatics  for  gn 


Average 

St.  Dev. 

C.V. 

B=100: 

.165 

.030 

.18 

B=200: 

.166 

.031 

.19 

True  a: 

.167 

Table  4.  Bootstrap  estimate  of  Standard  Elrror  for  the  Hodges'Lehmann  two-sample 
shift  estimate;  m  =  6,  n  =  9;  true  distributions  F  and  G  both  Uniform  [0, 1].  The  table  shows 
summary  statistics  for  orer  100  trials  of  tlii«  situation. 

where  F  and  G  are  possibly  different  distributions  on  the  real  line.  Suppose  also  that  the 
statistic  of  interest  is  the  Hodges- Lehmann  shift  estimate 

^  =  median{iV  -  i;,-  t  =  (5.2) 

Having  observed  =  Ui,  ITj  =  =  e„,  we  desire  an  estimate  for  <r{F,G),  the 

standard  error  of  $. 

The  bootstrap  estimate  of  (r{F,G)  is  ^  =  er(F,G),  where  j  is  the  empirical  distribution 
of  ui,U2,’‘"  an  G  is  the  empirical  distribution  of  oi.vj,'** ,««.  It  is  easy  to  modify  the 
Monte  Carlo  algorithm  of  Section  2  to  numerically  evaluate  Let  V  =  (ui,  uj,  *  *  ■ ,  Wn)  be  the 
observed  data  vector.  A  bootstrap  sample  f*  =  (uJjUj,* ‘•i**!.  consists  of  a 

random  sample  Uj ,  •  •  • ,  from  F  and  an  independent  random  sample  ly,  •  •  • ,  V„*  from  G. 
With  only  this  modification,  steps  (i)  through  (ii)  of  the  Monte  Carlo  algorithm  produce  ar^, 
(2.4),  approaching  &  as  B  —*  oo. 

Table  4  reports  on  a  simulation  experiment  investigating  how  well  the  bootstrap  works  on 
this  problem.  100  trials  of  situation  (5.1)  were  run,  with  m  =  6,  n  =  9,  and  G  both  Uniform 
[0, 1].  For  each  trial,  both  B  =  100  and  B  =  200  bootstr^  replications  were  generated.  The 
bootstrap  estimate  &b  was  nearly  unbiased  for  the  true  standard  error  <r(F,G)  =  .167  for 
either  B  =  100  or  B  =  200,  with  a  quite  small  standard  deviation  from  trial  to  trial.  The 
improvement  in  going  from  B  =  100  to  B  =  200  is  too  small  to  show  up  in  this  experiment. 

In  practice,  statisticians  must  often  consider  quite  complicated  data  structures:  time 
series  models,  multi-factor  layouts,  sequential  sampling,  censored  and  missing  data,  etc.  Fig¬ 
ure  8  illustrates  how  the  bootstrap  estimation  process  proceeds  in  a  general  situation.  The 
actual  probability  mechanism  P  which  generates  the  observed  data  f  belongs  to  some  fam- 
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ily  P  of  poeeiblc  probability  mechanism.  In  the  Hodges-Lehmann  example,  P  = 
a  pair  of  distributions  on  the  real  line,  P  equals  the  family  of  all  such  pairs,  and  f  = 

(«i,U2,  ••*,»„)  is  generated  by  random  sampling  m  times  from  F  and  n  times 

from  G. 

We  haye  a  random  yariable  of  interest  iJ(f ,  P),  which  depends  on  both  9  and  the  unknown 
model  P,  and  we  wish  to  estimate  some  aspect  of  the  distribution  of  R.  In  the  Hodges-Lehmann 
example,  P(f,P)  =  ^(f)  -  and  we  estimated  tr{P)  =  PpP(,,P)2,  the  standard  error 

of  $.  As  before,  the  notation  Ep  indicates  expectation  when  9  is  generated  according  to 
mechanism  P. 


Family  of 

Possible 

Actual 

Probability 

Probability 

Observed 

Models 

Model 

Data 

V - 

- >  P 

- - V  _ 

Estimated 

Probability- 

Model 

A 


Bootstran 

Data 

— >  y* 


R(y.P) 

Random  Variable  of  Interest 


RCy*,P) 

Bootstrap  Random  Variable 


Figure  8.  A  schematic  illustration  of  the  bootstrap  process  for  a 
general  probability  model  P.  The  expectation  of  P(g,  P)  is  estimated 
by  the  bootstrap  expectation  of  P(f  *,  P).  The  double  arrow  indicates 
the  crucial  step  in  applying  the  bootstrap. 


We  assume  that  we  have  some  way  of  estimatmg  the  entire  probability  model  P  from  the 
data  9,  producing  the  estimate  called  P  in  Figure  8.  (In  the  two-sample  problem,  P  =  (P,  G), 
the  pair  of  empirical  distributions.)  This  it  the  crucial  ttep  for  the  booUtrap.  It  can  be 
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carried  out  either  parametrically  or  nonparametrically,  by  maximum  likelihood  or  by  some 
other  estimation  technique. 

Once  we  have  P,  we  can  use  Monte  Carlo  methods  to  generate  bootstrap  data  sets  f*, 
according  to  the  same  rules  by  which  f  is  generated  from  P.  The  bootstrap  random  Tariable 
P(f*,P)  is  observable,  since  we  know  P  as  well  as  f*,  so  the  distribution  of  P(f*,  P)  can  be 
found  by  Monte  Carlo  sampling.  The  bootstrap  estimate  of  EpR{t,P)  is  then  EpR('§*,P), 
and  likewise  for  estimating  any  other  aspect  of  R{ff,  P)’8  distribution. 

A  regression  model  is  a  familiar  example  of  a  complicated  data  structure.  We  observe 
9  =  {yijV2,  --,Vn),  where 


yi  =  t  =  (5.3) 

Here  0  is  h  vector  of  unknown  parameters  we  wish  to  estimate;  for  each  i,  t{  is  an  observed 
vector  of  covariates;  and  g  is  &  known  function  of  0  and  t«,  for  instance  The  e,-  are  an 
i.i.d.  sample  from  some  unknown  distribution  F  on  the  real  line, 

~  P,  (5.4) 

where  F  is  usually  assumed  to  be  centered  at  0  in  some  sense,  perhaps  P{e}  =  0  or  Prob{e  < 
0}  =  .5.  The  probability  model  is  P  =  {0,  F);  (5.3)  and  (5.4)  describe  the  step  P  -♦  g  in 
Figure  5B.  The  covariates  ti,  fj,  •  •  •,<„,  like  the  sample  size  n  in  the  simple  problem  (1.1),  are 
considered  fixed  at  their  observed  values. 

For  every  choice  of  0  we  have  a  vector  §(0)  =  {gi0,ti),g{0,  tj),  •  •  • ,  g{0,  tn))  of  predicted 
values  for  f.  Having  observed  f,  we  estimate  0  by  minimizing  some  measure  of  distance 
between  §(0)  and  f, 

0  :va^D{g,t[0)).  (5.5) 

The  most  common  choice  of  P  is  D{f,-f)  « 

How  acciirate  is  ^  as  an  estimate  of  01  Let  P(|r,P)  equal  the  vector  0  -  0.  A  familiar 
measure  of  accuracy  is  the  mean  square  error  matrix 


$(P)  =  Ep{0  -  0){0  -  0)'  =  EpR[t,P)R{f,py. 


(5.6) 
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The  bootstrap  estimate  of  accuracy  ?{P)  is  obtained  by  following  through  Figure  8. 

There  is  an  obrious  choice  for  P  =  (4,  F)  in  this  case.  The  estimate  0  is  obtianed  from 
(5.5).  Then  F  is  the  empirical  distribution  of  the  residuals. 


F  :  mass  1  on  e,-  s  y,-  -  *  =  1,  •  •  • ,  n. 

A  bootstrap  sample  f*  is  obtained  by  following  rules  (5.3),  (5.4), 


(5.7) 


Vi  =  #(^.*»)  +  <ii  »■=  1,2,  •••,«, 

where  tj,  c^,  •  •  * ,  is  an  i.i.d.  sample  from  F .  (Notice  that  the  are  independent  bootstr^ 
variates,  even  though  the  c,*  are  not  independent  variates  in  the  usual  sense.] 

Each  bootstrap  sample  f*(5)  gives  a  bootstrap  value  P*{h), 


fi*(b) :  nunD(r*(b),t(fi)),  (5  9) 

as  in  (5.5).  The  estimate 

t.  _  sg..{yw-y()}{4‘w-y(n’  „  , 

^  Q  -  (5.10) 

approaches  the  bootstrap  estimate  as  B  -4  00.  (We  could  just  as  well  divide  by  B  -  1  in 
(5.10).) 


In  the  case  of  ordinary  least  squares  regression,  where  g{0,ti)  =  fi'U  and  D{g,$)  = 
Ei=i(Vi  -  Vi)’,  Section  7  of  Efron  (1979)  shows  that  the  bootstr^  estimate,  B  =  00,  can  be 
calculated  without  Monte  Cario  sampling,  and  is 


(5.11) 


This  is  the  usual  Gauss-Markov  answer,  except  for  the  divisor  n  in  the  deBnition  of  a^. 


There  is  another,  simpler  way  to  bootstr^  a  regression  problem.  We  can  consider  each 
covanate-response  pair  z,-  =  (t,-,  y,)  to  be  a  single  data  point  obtained  by  simple  random 
samplmg  from  a  distribution  F.  If  the  covariate  vector  «,•  is  p-dimensional,  F  is  a  distribution 
on  p  +  1  dimensions.  Then  we  apply  the  bootstrap  as  described  originally  in  Section  2  to  the 
data  set  zi,  Z2,  •  •  •  ,z„*'^F. 
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The  two  bootstrap  metbods  for  the  regression  problem  are  wymptoticallj  equiralent, 
but  c&n  perform  quite  differently  in  sm&il'Sample  situations.  The  class  of  possible  probability 
models  P  is  different  for  the  two  methods.  The  simple  method,  described  last,  takes  less 
advantage  of  the  special  structure  of  the  regression  problem.  It  does  not  give  answer  (S.ll)  in 
the  case  of  ordinary  least  squares.  On  the  other  hand  the  simple  method  gives  a  trustworthy 
estimate  of  4’s  variability  even  if  the  regretsion  model  is  not  eorreet  The  bootstrap,  as  outlined 
in  Figure  5B,is  very  general,  but  because  of  this  generality  there  will  often  be  more  than  one 
bootstrap  solution  for  a  given  problem. 

As  the  6nal  example  of  this  Section,  we  discuss  centored  data.  The  ages  of  97  men  at  a 
California  retirement  center,  Channing  Hoxise,  were  observed  either  at  death  (an  uncensored 
observation)  or  at  the  time  the  study  ended  (a  censored  observation).  The  data  set  f  = 
{(zi )  ))  (z2>  dj),  ■ ,  (z«7,  ds?)} ,  where  Xi  was  the  age  of  the  sth  man  observed  ,  and 

{1  if  Zi  uncensored 

(5.12) 

0  if  Zi  censored  . 

Thus  (777, 1)  represents  a  Channing  House  man  observed  to  die  at  age  777  months,  while 
(843,0)  represents  a  man  843  months  old  when  the  study  ended.  His  observation  could  be 
written  ‘843+”,  and  in  fact  d,-  is  just  an  indicator  for  the  absence  or  presence  of  a  *+”. 

A  typical  data  point  (Xi,  Di)  can  be  thought  of  as  generated  in  the  following  way:  a  real 
lifetime  X“  is  selected  randomly  according  to  a  survival  curve 

5*(t)  =  Prob{X;?  >  t},  (0  <  t  <  oo)  (5.13) 

and  a  censoring  time  Wi  is  independently  selected  according  to  another  survival  curve 

B(t)  =  Prob{IFi  >  t},  (0  <  t  <  oo).  (5.14) 

The  statistician  gets  to  observe 

Xi  =  min{X?,FFi}  (5.15) 


and 


fl  ifXi  =  X? 

Dil 

(0  if  Xi  =  Wi  . 


(5.16) 
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Note:  1  -  S®(t)  and  1  -  i2(t)  are  the  cumulative  distribution  functions  for  and  IV,  respec 
tively;  with  censored  data  it  is  more  convenient  to  consider  survival  curves  tloq  c.cLf.’s. 


Under  assumptions  (5.12)-(5.15)  there  is  a  simple  formula  for  the  nonparametric  MLE  of 
5*(t),  called  the  Kaplan-Meier  ettimator,  Kaplan  and  Meier  (19S8).  For  convenience  suppose 
*1  <  *j  <  *s  •  •  •  <  Sn,  n  =  97.  Then  the  Kaplan>Meier  estimate  is 


(5.17) 


where  kf  is  the  value  of  k  such  that  t  6  In  the  case  of  no  censoring,  ^*(0  “ 

equivalent  to  the  observed  empirical  distribution  of  *», zj,  •  •  • ,  Xn,  but  otherwise  (5.16)  corrects 
the  empirical  distribution  to  account  for  censoring.  Likewise 


is  the  Kaplan*Meier  estimate  of  the  censoring  curve  i2(t). 


(5.18) 


Figure  9  shows  5*(t)  for  the  Channing  House  men.  It  crosses  the  50%  survival  level  at 
$  =  1044  months.  Call  this  value  the  observed  median  lifetime.  We  can  use  the  bootstrap  to 
assign  a  standard  error  to  the  observed  median. 


The  probability  mechanism  is  P  =  (5*,P);  P  produces  (Xf,Di)  according  to  (5.12)- 
(5.15),  and  f  =  {(*i,di),-'*,(x,„dB)}  by  n  =  97  independent  repetitions  of  this  process. 
An  obvious  choice  of  the  estimate  P  in  Figure  8  is  (5.14),  (5.15).  The  rest  of  the 

bootstrap  process  is  automatic:  5*  and  R  replace  5*  and  P  in  (5.12),  (5.13);  n  pairs  (.X<,I?;) 
are  independently  generated  according  to  rules  (5.12)-(5.15),  giving  the  booUtrsp  data  set 
8*  ~  *"’>(**» <Ci)};  finally  the  bootstr^  K^lan>Meier  curve  5**  is  constructed 

according  to  formula  (5.16),  and  the  bootstrap  observed  median  #*  gave  estimated  standard 
error  a  =  14.0  months  for  9.  An  estimated  bias  of  4.1  months  was  calculated  as  at  (4.4).  Efron 
(1981c)  gives  a  fuller  description. 
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Figure  9.  KapU»>Meier  estimated  sarriral  evire  for  the  Chaanm; 

Honse  mea;  (  =  agie  in  months.  The  median  suvhral  a(e  is  estimated 
to  be  1,044  months  (87  years). 

Once  again  there  is  a  simpler  way  to  apply  the  bootstnq;).  Consider  each  pair  y,-  =  (z,-,  d,-) 
as  an  observed  point  obtained  by  simple  random  sampling  from  a  bivariate  distribution  F,  and 

••4 

apply  the  bootstrap  as  described  in  Section  3  to  the  data  set  This  method 

makes  no  use  of  the  special  structrire  (5.13)-{5.15).  Surprisingly,  it  gives  exactly  the  tame 
answers  as  the  more  complicated  bootstrap  method  described  earlier,  Efron  (1981a). 
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6.  Examples  with  more  complicated  data  structures. 

Example  1:  Aatoregreesive  Time  Series  Model 

This  example  illustrates  an  application  of  the  bootstrap  to  a  famous  time  series. 

The  data  are  the  Wolfer  annual  sunspot  numbers  for  the  jears  1770-1889  (taken  from 
Anderson  1976).  Let  the  count  for  the  ith  year  be  s,-.  After  centering  the  data,  (replacing  zj 
by  Zi  -  i,-)  we  fit  a  first  order  autoregressive  model 

*•  =  (6.1) 

where  e,-  ~  t.i.d.  N{Q,o^).  The  estimate  ^  turned  out  to  be  .815  with  an  estimated  standard 
error,  one  over  the  square  root  of  the  Fisher  information,  of  .053. 

A  bootstrap  estimate  of  the  standard  error  of  ^  can  be  obtained  as  follows.  Define  the 
residuab  f,-  =  s,-  -  for  i  =  2, 3, . . .  120.  A  bootstrap  sample  sj,  *3  . . .  xl^  b  created  by 
sampling  e^t3..■cl3o  replacement  from  the  residuab,  then  letting  sj  =  xi,  and  z,*  =: 

+  £?i  »'  =  2,...  120.  Finally,  after  centering  the  time  series  sj.xj, . .  .zj,,,,  u  the 
estimate  of  the  autoregressive  parameter  for  this  new  time  series.  (We  could,  if  we  wbhed, 
sample  the  from  a  fitted  normal  dbtribution.) 

A  hbtogram  of  1000  such  bootstrap  values  **  »hown  in  Figure  10. 

The  bootstrap  estimate  of  standard  error  was  .055,  agreeing  nicety  with  the  usual  formula. 
Note  however  that  the  dbtribution  b  skewed  to  the  left,  so  a  confidence  interval  for  ^  might 
be  asymmetric  about  as  discussed  in  Sections  8  and  9. 

In  bootstrapping  the  residuab,  we  have  assumed  that  the  first  order  auto-regressive  modei 
b  correct.  (Recall  the  discussion  of  regression  modeb  in  Section  5).  In  fact,  the  first  order 
autoregressive  model  b  far  from  adequate  for  thb  data.  A  fit  of  second-order  autoregressive 
model 

Zf  =  az,_i  +  ^z,_3  +  cf  (6.2) 

gave  estimates  d  =  1.37,  9  =  —.677,  both  with  an  estimated  standard  error  of  .067,  based 
on  Fisher  information  calculations.  We  applied  the  bootstrap  to  thb  model,  producing  the 
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Figure  10.  Bootstrap  histogram  of  ^  Wolfer 

sonspot  data,  model  (6.1) 

histograms  for  aj, . . . aj^oo  •  •  •  ^looo  “  Figures  11  and  12  respectively. 

The  bootstrap  standard  errors  were  .070  and  .068  respectively,  both  close  to  the  usual 
value.  Note  that  the  additional  term  has  reduced  the  skewness  of  the  first  coefficient. 


Example  2:  Estimathig  a  response  transformation  in  regression 

Box  and  Cox  (1964)  introduced  a  parametric  family  for  estimating  a  tranformation  of 
the  response  in  a  regression.  Given  regression  data  {(si,  yi),  ...(s*,  |r„)},  their  model  takes  the 
form 

x,(A)  =  s.- •/»  +  £,•  (6.3) 

where  z,(A)  =  (j^  -  1)/A  for  A  ^  0  and  logy,-  for  A  =  0,  and  £,•  ~  i.i.d  Estimates  of 


Figure  11.  Bootstrap  histogram  of  d*,...o[J^  for  the  Wolfer 
sonspot  data,  model  (6.2) 


Figure  12.  Bootstrap  hbtogram  of  Wolfer 

sunspot  data,  model  (6.2) 


so 


A  and  fi  are  found  by  minimizing  S"(zi  -  *i  •  4)’- 

Breiman  and  Friedman  (1984)  proposed  a  non*parametric  solution  for  this  problem.  Their 
so-called  ACE  (‘Alternating  Conditional  Ebcpectation” )  model  generalises  (6.3)  to 

*(Vt)  =  *<  •  4  +  (6.4) 

where  «(•)  is  an  unspecified  smooth  function.  (In  its  most  general  form,  ACE  allows  for 
transformations  of  the  covariates  as  well).  The  function  #(■)  and  parameter  fi  are  estimated  in 
an  alternating  fashion,  utilising  a  non-parametric  smoother  to  estimate  «(•). 

In  the  following  example,  taken  from  iViedman  and  Tibshirani  (1984),  we  compare  the 
Box  and  Cox  procedure  to  ACE  and  use  the  bootstrap  to  assess  the  variability  of  ACE. 

The  data,  from  Box  and  Cox  (1964),  consist  oC  a  3x3x3  experiment  on  the  strength  of 
yams,  the  response  Y  being  number  of  cycles  to  failxire,  and  the  factors  length  of  test  specimen 
{Xi)  (250,  300  or  350  mm),  amplitude  of  loading  cycle  (As)  (8,  9,  or  10  mm),  and  load  (.^j) 
(40,  45  or  50  gm).  As  in  Box  and  Cox,  we  treat  the  factors  as  quantitive  and  allow  only  a 
linear  term  for  each.  Box  and  Cox  found  that  a  logarithmic  transformation  was  appropriate, 
with  their  procedure  producing  a  value  of  -.06  for  A  with  an  estimated  95  percent  confidence 
interval  of  (-.18, .06). 

Figure  13  shows  the  transformation  selected  by  the  ACE  algorithm.  For  comparison,  the 
log  function  is  plotted  (normalized)  on  the  same  figure. 

The  similarity  is  truly  remarkable!  In  order  to  assess  the  variability  of  the  ACE  <nirve,  we 
can  apply  the  bootstrap.  Since  the  X  matrix  in  this  problem  is  fixed  by  design,  we  resampled 
from  the  residuals  instead  of  from  the  (s,-,  y,*)  pairs.  The  bootstrap  procedure  was  the  following: 
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Figure  13.  Estimated  tnarfomatioe  fraa  ACE  aed  ike  lof 
fonctkHif  for  Box  and  Cooc  example 

Calculate  residuals  c,  =  i(y«)  —  Zi  •  i  —  1, 2,  ...n 

Repeat  B  times 

Choose  a  sample  with  replacement  from 

Calculate  pf  =  ^  =  1»2,  .,.n 

Compute  «*(*)  =  result  of  ACE  algorithm  applied  to  (siiVDt- 

End 


The  niimber  of  bootstrap  replications  B  was  20.  Note  that  the  residuals  are  computed 
on  the  s(>)  scale,  not  the  p  scale,  because  it  is  on  the  s(*)  scale  that  the  true  residuals  are 
assumed  to  be  approximately  i.i.d..  The  20  estimated  tranformations,  5J(‘)> shown 
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Figure  14.  Bootstrap  replicatioiis  of  AGE  tranformations  for  Box 
and  Cox  example 

in  Figure  14. 

The  tight  clustering  of  the  smooths  indicates  that  the  original  estimate  «(•)  has  low 
▼ariability,  especially  for  smaller  values  of  V.  This  agrees  qualitatirely  with  the  short  confidence 
interval  for  A  in  the  Box  and  Cox  analysis. 
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7,  Bootstrap  Confidence  Intervals. 

This  section  presents  three  closely  related  methods  of  using  the  bootstrap  to  set  confidence 
intervals.  The  discussion  is  in  terms  of  simple  parametric  models,  where  the  logical  basis  of 
the  bootstrap  methods  is  easiest  to  see.  Section  8  extends  the  methods  to  mtiltiparameter  and 
nonparametric  models. 

We  have  discussed  obtmning  &,  the  estimated  standard  error  of  an  estimator  #.  In  practice, 
$  and  &  are  usually  used  together  to  form  the  approcdmate  confidence  interval  $  €  i± 

(1.7)  is  claimed  to  have  approximate  coverage  probability  1  -  2a.  For  the  law  school  example  of 
Section  2,  the  values  $  =  .776,  a  =  .115,  =  -1.645,  give  0  €  (.587,  .965]  as  an  approximate 

90%  central  interval  for  the  true  correlation  coefficient. 

We  will  call  (1.7)  the  tiandard  interval  for  $.  When  working  within  parameteric  families 
like  the  bivariate  normal,  b  in  (1.7)  is  usually  obtained  by  differentiating  the  log  likelihood 
function,  see  Section  5a  of  Rao  (1973),  though  in  the  context  of  this  paper  we  might  prefer  to 
use  the  parametric  bootstrap  estimate  of  a,  e.g.  in  Section  2. 

The  standard  intervals  suv  an  immensely  useful  statistical  tool.  They  have  the  great 
virtue  of  being  automatic:  a  computer  program  can  be  written  which  produces  (1.7)  directly 
from  the  data  f  and  the  form  of  the  density  function  for  f ,  with  no  further  input  required 
&om  the  statistician.  Nevertheless  the  standard  intervals  can  be  quite  inaccurate,  as  Table  5 
shows.  The  standard  interval  (1.7),  using  ^UORM*  “  ■trikingly  different  than  the  exact 
normal-theory  interval  based  on  the  assumption  of  a  bivariate  normal  sampling  distribution 
F. 

In  this  case  it  is  well-known  that  it  is  better  to  make  the  transformation  b  =  tanh~^(i), 
^  =  tanh~‘(^),  apply  (1.7)  on  the  0  scale,  and  then  transform  back  to  the  9  scale.  The  resulting 
interval,  line  3  of  Td)le  7A,  is  moved  closer  to  the  exact  interval.  However,  there  is  nothing 
automatic  about  the  tanh~^  transformation.  F<^  a  different  statistic  than  the  correlation 
coefficient  or  a  different  distributional  family  than  the  bivariate  normal,  we  might  very  well 
need  other  tricks  to  make  (1.7)  perform  satisfactorily. 

The  bootstrap  can  be  used  to  produce  approximate  confidence  intervals  in  an  automatic 
way.  The  following  discussion  is  abridged  from  Efron  (19S4a  and  b)  and  E&on  (1982,  Chapter 
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1.  E^act  (Nonn&l  Theory): 

2.  Standard  (1.7): 

3.  Transformed  Standard: 

4.  Parametric  Bootstrap  {BC): 

5.  Nonparametric  Bootstrap  (BC^): 


{.496,.898] 

R/L  = 

.44 

[.687,.96S] 

R/L  = 

l.OO 

I.568,.9071 

R/L  = 

.49 

[.488,.900] 

R/L  = 

.43 

(.43,.921 

R/L  = 

.42 

Table  5.  Exact  and  approximate  central  90%  confidence  mterrals  for  #,  the  true  correlation 
coefiBcient,  from  the  law  school  data  of  Figure  1.  R/L  =  ratio  of  right  side  of  interval,  measured 
from  $  =  .776,  to  left  side.  The  exact  mterral  is  strikingly  asymmetric  about  0.  Section  8 
discTisses  the  nonparametric  method  of  line  5. 

10).  Line  4  of  Thble  5  shows  that  the  parametric  bootstrap  interval  for  the  correlation  coef- 
*  ficient  0  is  nearly  identical  to  the  exact  interval.  •Parametric*  in  this  case  means  that  the 
bootstrap  algorithm  begins  from  the  bivariate  normal  MLE  FnoRM’  “  normal  theory 

curve  of  Figure  2.  This  good  performance  is  no  accident.  The  bootstr^  method  used  on  line  4 
in  effect  transforms  $  to  the  best  (most  normal)  scale.  AU  of  this  is  done  automatically  by  the 
bootstrap  algorithm,  without  requiring  special  intervention  from  the  statistician.  The  price 
paid  is  a  large  amount  of  computing,  perhaps  B  =  1000  bootstrap  replications,  as  discussed 
in  Section  10. 

Define  G(s)  to  be  the  parametric  bootstrap  c.d.f.  of  0*, 

G(s)  =  Prob.{i*  <  s},  (7.1) 

where  Prob«  indicates  probability  computed  according  to  the  bootstr^  distribution  of  In 
Figure  2,  G(s)  is  obtained  by  integrating  the  normal  theory  curve.  We  will  present  three 
different  kinds  of  bootstrap  confidence  intervals,  in  order  of  increasing  genreality.  All  three 
methods  use  percentiles  of  G  to  define  the  confidence  interval.  They  differ  in  which  percentiles 
are  used. 

The  simplest  method  is  to  take  9  €  [G~^(a),G~^(l  —  o)]  as  an  approximate  1  —  2q 
central  interval  for  $.  This  is  called  the  percentile  method  in  Section  10.4  of  Efron  (1982).  The 
percentile  method  interval  is  just  the  interval  between  the  100  •  a  and  100  •  (1  -  a)  percentiles 
of  the  bootstrap  distribution  of  6*. 


We  will  use  the  notation  of  ^(o]  for  the  a-lerel  endpoint  of  an  approximate  confidence 
interval  for  $,  bo  0  e  [^(a],  #[l  -  a]]  ia  the  central  1  -  2a  inteiral.  Subecripts  will  be  used  to 
indicate  the  various  different  methods.  The  percentUe  interval  has  the  endpoints 

(7.2) 

This  compares  with  the  standard  interval, 

#s(oI  =  ^  (7.3) 

Suppose  the  bootstrap  c.d.f.  G  is  perfectly  normal,  say 

C(.)  =  ♦  ,  (T.4) 

where  ♦(«)  =  the  standard  normal  c.d.f.  In  other  words,  sup¬ 

pose  that  0*  has  bootstrap  distribution  N{0,o^).  In  this  case  the  standard  method  and  the 
percentile  method  agree,  In  situations  like  that  of  Figure  2,  where  G  is  markedly 

nonnormal,  the  standard  interval  is  quite  different  from  (7.2).  Which  is  better? 

To  answer  this  question,  consider  the  simplest  possible  situation,  where  for  all  0 

0~N(0y).  (7.5) 

That  is,  we  have  a  sin^e  unknown  parameter  0  with  no  nuisance  parameters,  and  a  single 
summary  statistic  0  normally  distributed  about  0  with  constant  standard  error  <r.  In  this  case 

the  parametric  bootstrap  c.d.f.  is  given  by  (7.4),  so  #s[a]  =  Mo).  (The  bootstrap  estimate  o 
equals  <r.) 

Suppose  though  that  instead  of  (7.5)  we  have,  for  all  I, 

(7.6) 

for  some  monotone  transformation  ^g{0),  ^  =  g{0),  where  r  is  a  constant.  In  the  correlation 
coefficient  example  the  function  g  was  tanh"^  The  standard  limits  (7.2)  can  now  be  grossly 
inaccurate.  However  it  is  easy  to  verify  that  the  percentile  limits  (7.2)  are  still  correct.  “Cor¬ 
rect”  here  means  that  (7.2)  is  the  mapping  of  the  obvious  interval  for  4,  back  to  the 


#  scale,  #,(a]  =  +  rz<">).  It  is  also  correct  in  the  sense  of  hariny  exactly  the  claimed 

average  probability  1  —  2a. 

Another  way  to  state  things  is  that  the  percentile  intervals  are  transformation  invariant, 

^p[q]  =  #(#,[al)  (7.7) 

for  any  monotone  transformation  g.  This  implies  that  if  the  percentile  intervals  are  correct 
on  some  transformed  scale  ^  =  g{0),  then  they  most  also  be  correct  on  the  original  scale  9. 
The  statistician  doesn’t  need  to  know  the.  normalizing  transformation  g,  only  that  it  exists. 
DeBnition  (7.2)  automatically  takes  care  of  the  bookkeeping  involved  m  the  use  of  normalizing 
transformatioiu  for  confidence  internals. 

Fisher’s  theory  of  maximum  likelihood  estimation  says  that  we  are  always  in  situation 
(7.5)  to  a  first  order  of  asymptotic  approximation.  However  we  are  also  in  situation  (7.6)  for 
any  choice  of  g,  to  the  same  order  of  approximation.  Efron  (1984a  and  b)  uses  higher  order 
asymptotic  theory  to  diflTerentiate  between  the  standard  and  bootstrap  intervals.  It  is  the 
higher  order  asymptotic  terms  which  often  make  exact  intervals  strongly  asymmetric  about 
the  MLE  as  in  Table  5.  The  bootstrap  intervals  are  effective  at  capturing  this  asymmetry. 

The  percentile  method  automatically  incorporates  normalizing  transformations,  as  in  go¬ 
ing  from  (7.5)  to  (7.6).  It  turns  out  that  there  are  two  other  important  ways  that  assumption 
(7.5)  can  be  misleading,  the  first  of  which  relates  to  possible  bias  in  t.  Tot  example  consider 
the  family  of  densities  for  the  observed  correlation  coefficient  $  when  n  =  15 

times  from  a  bivariate  normal  distribution  with  true  correlation  #.  In  fact  it  is  easy  to  see  that 
no  monotone  mapping  ^  =  g{$),  ^  =  g{$)  transforms  this  family  to  ^  ~  N{^,  r^),  as  m  (7.6). 
If  there  were  such  a  g,  then  Prob#{tf  <  #}  =  Prob^{^  <  ^}  =  -50,  but  for  I  =  .776  integrating 
the  density  function  /.77*(0  fires  Prob#=.77,{#  <  #}  =  .431. 

The  biat-eometed  percentile  method  {BC  method)  makes  an  a4iustment  for  this  type  of 

bias.  Let 

so  =  (7.8) 

where  is  the  inverse  function  of  the  standard  normal  c.d.f.  The  BC  method  has  a-level 


JT 


endpoint 

(7.9) 

Note:  if  G(9)  =  .50,  that  is  if  half  of  the  bootstr^  distribution  of  #*  is  less  than  the  obserred 
value  #,  then  so  =  0  and  ^bc(<»]  =  Otherwise  definition  (7.9)  makes  a  bias  correction. 

Section  10.7  of  Efron  (1982)  shows  that  the  BC  mterval  for  #  is  exactly  correct  if 

(7,10) 

for  some  monotone  transformation  ^  =  g(i),  ^  =  g{$)  and  some  constant  zq.  If  doesn’t  look 
like  (7.10)  is  much  more  general  than  (7.6),  but  in  fact  the  bias  correction  is  often  important. 

In  the  example  of  Table  5,  the  percentile  method  (7.2)  gives  central  90%  interval  (.536, 
.911]  compared  to  the  BC  interval  [.488,  .900).  By  definition  the  endpoints  .496  and  .898  of 
the  exact  interval  satisfy 

Pro#=.49e{tf  >  .776}  =  .05  =  Prob#-4M{^  <  .776).  (7.11) 

The  corresponding  quantities  for  the  BC  endpoints  are 

Probs=.488{^  >  .776}  =  .0465,  Prob#-.,oo{^  <  .776}  =  .0475,  (7.12) 

compared  to 

Prob#=,83«{tf  >  .776}  =  .0725,  Prob#-..*n{^  <  .776}  =  .0293.  (7.13) 

for  the  percentile  endpoints.  The  bias  correction  is  quite  important  in  equalizing  the  error 
probabilities  at  the  two  endpoints.  If  xg  can  be  ^>praximated  accurately  (as  mentioned  in 
Section  9),  then  it  is  preferable  to  use  the  BC  intervals. 

Table  6  shows  a  simple  example  where  the  BC  method  is  less  successful.  The  data  consists 
of  the  single  observation  #  ~  ^(Xis/19)f  kbe  notation  indicating  an  unknown  scale  parameter 
$.  In  this  case  the  BC  interval  based  on  #  is  a  definite  improvement  over  the  standard  interval 

(1.7),  but  goes  only  about  half  as  far  as  it  should  toward  achieving  the  asymmetry  of  the  exact 
interval. 

It  turns  out  that  the  parametric  family  8  ~  ^(x?#/19)  cannot  be  transformed  into  (7.10), 
not  even  approximately.  The  results  of  Efron  (1982a)  show  that  there  does  exist  a  monotone 
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1.  Elz&ct 

2.  St&ndard  (1.7) 

3.  BC  (7.9) 

4.  BC,  (7.15) 

5.  Nonpanmetric  BC, 


[.631  •  #,  1.88  •  #] 
(.466  •  i,  1.53  •  i] 
(.580  •#,  1.69-1] 
[.630  •  i,  1.88  -  j] 

(.640  •  i,  1.68  •  i\ 


R/L  =  2.38 
R/L  =  1.00 
R/L  =  1.64 
R/L  «  2.37 
R/L  =  1.88 


Table  6.  Centnl  90%  confidence  interrals  for  #,  haring  obeerred  #  ~  ^(Xit/l^)-  The  exzict 
interval  is  sharply  skewed  to  the  right  of  0.  The  BC  method  is  only  a  partial  improvement 
over  the  standard  interval.  The  BC,  interval,  a  =  .108,  agrees  almost  perfectly  with  the  exact 
interval. 

transformation  g  such  that  ^  =  g(0),  ^  =  g{0)  satisfy  to  a  high  degree  of  approximaton 

^  ~  N{4  -  zoT^,  r^)  (r^  =  1  +  o^).  (7.14) 

The  constants  in  (7.14)  are  xq  =  .1082,  a  =  .1077. 

The  BC,  method^  Efron  (1984b),  is  a  method  of  assigning  bootstnp  confidence  intervals 
which  are  exactly  right  for  problems  which  can  be  mapped  into  form  (7.14).  This  method  has 
a-level  endpoint 

fBC.[a]  =  G  *  1*0  +  i_a(*o +  *(•))})  ' 

If  o  =  0  then  0bc,\^]  —  but  otherwise  the  BC,  intervals  can  be  a  substantial  improve¬ 

ment  over  the  BC  method,  as  shown  in  Table  7B. 

The  constant  *o  in  (7.15)  is  given  by  *o  =  f  {G(^)},  (7.8),  and  so  can  be  computed 
directly  from  the  bootstrap  distribution.  How  do  we  know  a?  It  turns  out  that  in  one- 
parameter  families  /§($),  a  good  approximation  is 

where  SKEW^^(/s(t))  is  the  skewness  at  parameter  value  0  =  0  of  the  score  statistic  4(t)  = 
^log/s(t).  For  0  ~  ^(Xis/IS)  this  gives  a  =  .1081,  compared  to  the  actual  value  a  =  .1077 
derived  in  Efron  (1984b).  For  the  normal  theory  correlation  of  Table  5  a  ==  0  which  explains 
why  the  BC  method,  which  takes  a  =  0,  works  so  well  there. 


The  adTsntage  of  formula  (7.18)  is  that  we  needn’t  know  the  transformation  g  leading  to 

(7.14)  in  order  to  approximate  a.  In  fact  lacjal,  like  #30.(0]  and  #3(0],  is  transformation 
inyariant,  as  m  (7.7).  Like  the  bootstr^  methods,  the  BC,  mtenrals  are  computed  directljr 
from  the  form  of  the  density  function  /«(•),  for  #  near  #. 

Formula  (7.16)  ^plies  to  the  case  where  #  is  the  only  parameter.  Section  8  briefy  di^ 
ctuses  the  more  challenging  problem  of  setting  confidence  interrals  for  a  parameter  #  in  a 
multiparameter  family,  and  also  in  nonparametric  situations  where  the  number  of  nuisance 
parameters  is  effectively  infinite. 

To  summarise  this  section,  the  progression  from  the  standard  intervals  to  the  method 
is  based  on  a  series  of  mcreasin^y  less  restrictive  assumptions,  (7.5),  (7.6),  (7.10),  and  finally 

(7.14) .  Each  step  requires  the  statistician  to  do  a  greater  amount  of  computation,  first  the 
bootstrap  distribution  G,  then  the  bias-correction  constant  sq*  «nd  finally  the  constant  a. 
However  all  of  these  computations  are  algorithms  in  character,  and  can  be  carried  out  in  an 
automatic  fashion. 

Chapter  10  of  Efron  (1982)  discusses  several  other  ways  of  using  the  bootstr^  to  construct 
approximate  confidence  intervals,  which  will  not  be  presented  here.  One  of  these  methods,  the 
‘Isootstrap  t” ,  was  \ised  in  the  blood  serum  example  of  Section  4. 

8.  Nonparametric  and  Multiparameter  Confidence  Intervals. 

Section  7  focused  on  the  simple  case  I  ~  /#,  where  we  have  only  a  real-valued  parameter 
#  and  a  real-valued  summary  statistic  #  from  which  we  are  trying  to  construct  a  confidence 
interval  for  #.  Various  favorable  properties  of  the  bootstr^  confidence  intervals  were  were 
demonstrated  in  the  simple  case,  but  of  course  the  simple  case  is  where  we  least  need  a  general 
method  like  the  bootstr^. 

Now  we  will  discuss  the  more  common  situatiem  where  there  are  nuisance  parameters 
besides  the  parameter  of  interest  #;  or  even  more  generally  the  nonparametric  case,  where  the 
number  of  nuisance  parameters  is  effectively  infinite.  The  discussion  is  limited  to  a  few  brief 
examples.  Elfron  (1984a  and  b)  develops  the  theoretical  basis  of  bootstrap  confidence  intervals 
for  complicated  situations,  and  gives  many  more  examples. 
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for  9  for  ^ 

1.  Exact  (Fieller):  (.29, .76]  (1.32,3.50] 

2.  Par&metric  Boot  {BC)\  (.29,.76]  (1.32,3.50] 

3.  Standard  (1.7):  (•27,.73]  (1.08,2.92] 

MLE  i  .5  ^  =  2 


Table  7.  Central  90%  confidence  interrals  for  9  =  for  ^  =  Ifi,  baTing  observed 

(viiVs)  =  (8,4)  from  a  bivariate  normal  distribution  f  ~  Ni{^,I).  The  BC  intervals,  line  2, 
are  based  on  the  parametric  bootstrap  distribution  of  ^  =  Ih/vi- 


Example  1:  Ratio  Estimation 

The  data  consists  of  f  =  (vitVs),  assumed  to  come  from  a  bivariate  normal  distribution 
with  unknown  mean  vector  ^  and  covariance  matrix  the  identity. 


9^N2(n,i). 


The  parameter  of  interest,  for  which  we  desire  a  confidence  interval,  is  the  ratio 


(8.1) 


9  =  *n/m-  (8.2) 

Fieller  (1954)  provided  well-known  exact  intervals  for  9  having  observed  f  =  (8,4).  Also 
shown  is  the  Fieller  interval  for  ^  =  1/9  =  ni/fti,  which  equals  (.76“^,  .29“^],  the  obvious 
transformation  of  the  interval  for  9.  The  standard  interval  (1.7)  is  satisfactory  for  9,  but  not 
for  Notice  that  the  standard  interval  does  not  transform  correctly  from  9  to  4- 

line  2  shows  the  BC  intervals  based  on  applying  definitions  (7.8),  (7.9)  to  the  parametric 
bootstrap  distribution  of  ^  =  yt/vi  (or  4  =  Vi/v^)-  This  is  the  distribution  of  #*  =  Vj/yJ  when 
sampling  y*  =  (yj,y5)  from  ^nqRM  ~  ^aCCFiiyr),/)-  The  bootstnq)  intervab  transform 
correctly,  and  in  this  case  they  agree  with  the  exact  interval  to  three  decimal  places. 

Example  2:  Product  of  Normal  Means 


For  most  multiparameter  situations,  there  do  not  exist  exact  confidence  intervals  for  a 


4:^ 


_  for  # _ for  4 

1.  Almost  Exact:  [1.77,17.03]  [3.1,290.0] 

2.  Parametric  Boot  {BC):  [1.77,17.12]  [3.1,293.1] 

3.  Standard  (1.7): _ [0.64,15.36]  [>53.7,181.7] 

MLE  i  =  8  ^  =  64 


T^ble  6.  Central  90^  confidence  interrals  for  t  =  tiiri2  and  ^  having  observed  f  =  (2, 4^, 
where  f  ~  ^2(11,/).  The  almost  exact  intervals  are  based  on  the  high  order  ^prcoriatioi 
theory  of  Efron  (1984a).  The  BC  intervals  of  line  2  are  based  on  he  parametric  bootstraj) 
distribution  of  6  =  yii/j. 

single  parameter  of  interest.  Suppose  for  instance  that  (8.2)  is  changed  to 

^  =  ’I1I2  (8.3) 

still  assuming  (8.1).  Table  8  shows  approximate  intervals  for  0,  and  also  for  4  =  4*,  having 
observed  f  =  (2,4).  The  ‘almost  exact”  mtervals  are  based  on  an  analogue  of  Fieller’s  ar¬ 
gument,  Efron  (1984a),  which  with  suitable  care  can  be  carried  through  to  a  high  degree  of 
accTiracy.  Once  again,  the  parametric  BC  intervals  are  a  close  match  to  line  1.  The  fact  thtf 
the  standard  intervals  do  not  transform  correctly  is  particularly  obvious  here. 

The  good  performance  of  the  parametric  BC  intervals  is  not  accidental.  The  theory 
developed  in  Efron  (1984a)  shows  that  the  BC  intervals,  based  on  bootstr^ping  the  MLE  f 
agree  to  high  order  with  the  almost  exact  intervals  in  the  following  class  of  problems:  the  data 
r  comes  form  a  multiparameter  family  of  densities  /i,(f),  both  r  and  if  i-dimensional  vectors; 
the  real-valued  parameter  of  interest  4  is  a  smooth  function  of  f ,  I  =  t(i|);  and  the  family 
ftlif)  can  be  transformed  to  mxiltivarite  normality,  say 

4(f)  ~  ^*(M7)./),  (8.4; 

by  some  one-to-one  transformations  §  and  h. 

Just  as  in  Section  7,  it  is  not  necessary  for  the  statistician  to  know  the  normalkin; 
transformations  g  and  h,  only  that  they  exist.  The  BC  intervals  are  obtained  directly  fron 
the  original  densities  /,:  we  find  fj  =  ^(y),  the  MLE  of  ij;  sample  y*  A;  compute  $*,  the 
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bootstr&p  MLE  of  d]  calculate  G,  the  bootstrap  c.d.f.  of  t*,  tuuallj  by  Monte  Carlo  sampling, 
and  finally  apply  definitions  (7.8),  (7.9).  This  process  giTes  the  same  interval  for  I  whether  or 
not  the  transformation  to  form  (8.4)  has  been  made. 

Not  all  problems  can  be  transformed  as  in  (8.4)  to  a  normal  distribution  with  constant 
covariance.  The  case  considered  in  Table  6  is  a  one*dimensional  counterexample.  As  a  result 
the  BC  intervals  do  not  always  work  as  well  as  in  Thbles  7  and  8,  though  they  usually  improve 
on  the  standard  method.  However  in  order  to  take  advantage  of  the  BC^  method,  which  is 
based  on  more  general  assumptions,  we  need  to  be  able  to  calculate  the  constant  a. 

Elfron  (1984b)  gives  expressions  for  *a*  generalizing  (7.16)  to  multiparameter  families, 
and  abo  to  nonparametric  situations.  If  (8.4)  holds,  then  *a”  will  have  value  zero,  and  the 
BC,  method  reduces  to  the  BC  case.  Otherwise  the  two  intervals  differ. 

Here  we  will  discuss  only  the  nonparametric  situation:  the  observed  data  V  =  (zi, Z},  ■  ■  ■ , 
z„)  consists  of  i.i.d.  observations  Xi,  •"  ,X„  •»  F,  where  F  can  be  any  distribution  on  the 
sample  space  X ;  we  want  a  confidence  interval  for  9  =  t[F),  some  real>valued  functional  of  F\ 
and  the  bootstrap  interval  are  based  on  bootstrapping  i  =  t{F),  which  u  the  nonparametric 
MLE  of  0.  In  this  case  a  good  approximation  to  the  constant  a  is  given  in  terms  of  the  empirical 
influence  function  Cff,  defined  in  Section  10  at  (10.11), 

This  is  a  convenient  formula,  since  it  is  easy  to  numerically  evaluate  the  U*  by  simply  substi¬ 
tuting  a  small  value  of  0  into  (10.11). 

Example  3:  The  Law  School  Data 

For  0  the  correlation  coefficient,  the  values  of  V*  corre8p<mding  to  the  IS  data  points 
shown  in  Figure  1  are  '1.807,  .168,  .373,  X)04,  .335,  •X49,  *.100,  .477,  .310,  .004,  >.836,  -.091, 
.323,  .125,  -.048.  (Notice  how  influential  law  school  1  is.)  Formula  (8.5)  gives  a  =  —.0817. 
B  =  100,000  bootstrap  relications,  about  100  times  more  than  was  actually  necessary,  see 
Section  10,  gave  sq  =  -.0927,  and  the  central  90%  Interval  0  €  (.43,  .92]  shown  in  Table  7. 
The  nonparametric  BC,  interval  is  quite  reasonable  in  this  example,  particularly  considering 


43 


that  there  ia  no  guarantee  that  the  true  law  school  distribution  F  is  anywhere  near  a  biyariate 
normal. 

Example  4:  Mouse  Leukemia  Data  (the  first  example  in  Section  S) 

The  standard  central  90%  interral  for  in  formula  (3.1)  is  [.835,  J .18].  The  bias-correctioa 
constant  =  .0275,  giving  BC  mterval  [1.00,2.39].  This  is  shifted  far  ri^  of  the  standard 
interval,  reBecting  the  long  right  tail  of  the  bootstr^  histogram  seen  in  Figure  8.  We  can 
calculate  “a’  from  (8.5),  considering  each  of  the  n  =  42  data  points  to  be  a  triple  (y.-,  *,-,  #,) ; 
0  =  -.152.  Because  a  is  negative,  the  BC,  interval  is  shifted  back  to  the  left,  equaling 
[.788, 2.10].  This  contrasts  with  the  law  school  example,  where  a,  xc,  and  the  skewness  of 
the  bootstrap  distribution  added  to  each  other  rather  than  csmcelling  out,  resulting  in  a  BC, 
intenral  much  different  than  the  standard  normal. 

Efron  (1984b)  provides  some  theoretical  support  for  the  nonparametric  BC,  method. 
However  the  problem  of  setting  approximate  nonparametric  confidence  mtervals  is  still  far 
from  well  understood,  and  all  methods  should  be  interpreted  with  some  caution.  We  end  this 
section  with  a  cautionary  example. 

Example  5:  The  Variance 

Suppose  I  is  the  real  line,  and  #  =  VaryX,  the  variance.  Line  5  of  Tible  2  shows 
the  result  of  H>plying  the  noiq>arametric  BC,  method  to  data  ■ets  xi, »j,  •  •  • , xjo  which  were 
actually  i.i.d.  samples  from  a  N{0, 1)  distribution.  The  number  .640  for  example  is  the  average 
of  he,  [.05]/8  over  40  such  data  sets,  B  =  4000  bootstrap  replications  per  data  set.  The  upper 
limit  1.68  •  $  is  noticeaWy  small,  as  pointed  out  by  Schenker  (1983).  The  reason  is  simple: 
the  nonparametric  bootstr^  distribution  which  is  a  scaled  Xis  random  variable.  The  results 
of  Beran  (1984),  Bickel  and  Friedman  (1981),  and  Singh  (1981)  show  that  the  nonparametric 
bootstrap  distribution  is  highly  accurate  asymptotically,  but  of  course  that  isn’t  a  guarantee 
of  good  small-sample  behavior.  Bootstrapping  from  a  smoothed  version  of  /,  as  in  lines  3,  4, 
and  5  of  Table  2  alleviates  the  problem  in  this  particular  example. 
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9.  Bootstrap  Sample  Sizes. 

How  many  bootstrap  replications  must  we  take?  Consider  the  standard  error  estimate 
dp  based  on  B  bootstrap  replications,  (2.4).  As  H  -»  oo,  approaches  o,  the  bootstrap 
estimate  of  standard  error  as  originally  defined  in  (2.3).  Because  F  does  not  estimate  F 
perfectly,  &  =  (t(/')  will  have  a  non>xero  coefficient  of  Tariation  {CV)  for  estimating  the  true 
standard  error  <r  =  ^(J');  ob  ^  &  larger  CV  because  of  the  randomness  added  by  the 

Monte  Carlo  bootstrap  sampling. 

It  is  easy  to  deriye  the  foUowing  approximation, 

CV{&b)  =  |cV(d)»  +  I  ,  (9.1) 

where  S  is  the  kurtosis  of  the  bootstrap  distribution  of  9*,  giren  the  data  f,  and  E{6}  its 
expected  value  averaged  over  f.  For  typical  situations,  CV{ff)  lies  between  .10  and  .30.  For 
example  if  ^  =  2,  n  =  20,  *,-^iV(0, 1),  then  CV{a)  =  .16. 

Table  9  shows  CV{&b)  for  various  values  of  B  and  CV{&),  assxuning  =  0  in  (9.1). 
For  values  of  CV(a)  >  .10,  there  u  little  improvement  poet  B  =  100.  In  fact  B  as  small  as  25 
gives  reasonable  results.  Even  smaller  values  of  B  can  be  quite  informative,  as  we  saw  in  the 
Stanford  Heart  Transplant  Data,  Figure  of  Section  3. 

B  — 


25 

50 

100 

200 

00 

CV(j) 

.25 

.29 

.27 

.26 

.25 

.25 

1 

.20 

.24 

.22 

.21 

.21 

.20 

.15 

.21 

.18 

.17 

.16 

.15 

.05 

.15 

.11 

.09 

.07 

.05 

0 

.14 

.10 

.07 

.05 

0 

Table  9.  Coefficient  of  variation  of  the  bootstrap  estimate  of  standard  error  based  on  B 
Monte  Carlo  replications,  as  a  function  of  B  and  CV{&),  the  limiting  CV  as  B  — »  oo.  Bued 
on  (9.1),  assuming  B{l}  =  0. 
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The  situation  is  quite  different  for  setting  bootstrap  confidence  interrals.  The  calcnlatioi* 
of  Efron  (1984b),  Section  8,  show  that  B  =  1000  is  a  rough  minimum  for  the  number  of  Monte 
Carlo  bootstraps  necessary  to  compute  the  BC  or  BC,  btenrals.  Somewhat  smaller  ralues'. 
say  B  =  250,  can  give  a  useful  percentile  interval,  the  difference  being  that  then  the  constait 
So  need  not  be  computed.  Confidence  intervals  are  a  fundamentally  more  ambitioTis  measuis 
of  statistical  accuracy  than  standard  errors,  so  it  is  not  surprising  that  they  require  mote 
computational  effort. 


10.  The  Jackknife  and  the  Delta  Method. 

This  section  returns  to  the  simple  case  of  assigning  a  standard  error  to  $(y),  where 
f  ~  (®i*  ■  ■ '  >  *n)  i*  obtained  by  random  sampling  from  a  single  unknown  distribution, 

X„  ~  F.  We  will  give  another  description  of  the  bootstrap  estimate  a,  which  illustrates  tlr 
bootstrap’s  relationship  to  older  techniques  of  assigning  standard  errors,  like  the  jackknife  and’ 
the  delta  method. 

For  a  given  bootstrap  sample  =  (zj,  •  •  • ,  z;),  as  described  in  step  (i)  of  the  algorithn 
in  Section  2,  let  pj  indicate  the  proportion  of  the  bootstrap  sample  equal  to  z,-. 

Pi- - ^ -  «  =  l,2,",n,  (lO.li: 

P  ~  (Pi)P2> '  ■  ■  >Pn)*  The  vector  p*  has  a  rescaled  multinomial  distribution 

p*  ~  Mult«(n,p*)/n  (p*  =  (1/n,  1/n,  •  • ,  1/n)), 

where  the  notation  indicates  the  proportions  observed  from  n  random  draws  on  n  categories, 
each  with  probability  1/n. 

For  n  =  8  there  are  10  possible  bootstr^  vectors  p*.  These  are  indicated  m  Figure 
15  along  with  their  multinomial  probabilities,  from  (10.2).  For  example,  p*  =  (1/3, 0,2/3). 

corresponding  to  z*  =  (xi,  zj,  zj)  or  any  permutation  of  these  values,  has  bootstrtq)  probability 
1/9. 

To  make  our  discussion  easier,  suppose  that  the  statistic  of  interest  is  of  functional 
form;  =  fi(F),  where  tf(J’)  is  a  functional  assigning  a  real  number  to  any  distribution  I 
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Figure  1 5 .  The  bootstrap  and  jackknife  samplin ;  points  in  the  ease 
n  =  S.  The  bootstrap  points  (•)  are  shown  with  their  probabilities. 

on  the  sample  space  Z .  The  meam,  the  correlation  coefficient,  and  the  trimmed  mean  are  all 
of  functional  form.  Statistics  of  functional  form  have  the  same  trahie  as  a  function  of  F,  no 
matter  what  the  sample  size  n  may  be,  which  is  convenient  for  discussing  the  jackknife  and 
delta  method. 

For  any  vector  p  =  (pi,P3,  *  *  *  ,Pn)  having  non-negative  weights  summing  to  1,  define  the 
wei^ted  empirical  distribution 

F{p) :  probability  p,-  on  z,  s'  =  1,  •  •  • ,  n.  (10-3) 

For  p  =  p*  =  xfn,  the  weighted  empirical  distribation  equals  F,  (1.4). 

Corresponding  to  p  is  a  resampled  value  of  $, 

np)  =  «{F{p)).  (10.4) 


The  shortened  notation  assximes  that  the  data  (zi,  Z2>  *  *  *  j^)  considered  fixed.  Notice 
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that  #(/*)  =  ^(/)  IB  the  observed  value  of  the  statistic  of  interest.  The  bootstrap  estimate  a. 
(2.3),  can  then  be  written 

&  =  (  Var.  (10.5) 

where  Var,  indicates  variance  with  respect  to  distribution  (10.2).  In  terms  of  Figure  15,  a  b 
the  standard  deviation  of  the  ten  possible  bootstrap  values  #(/*),  weighted  as  shown. 

It  looks  like  we  could  alwajs  calculate  a  simply  by  doing  a  finite  sum.  Unfortunately  the 
number  of  bootstrap  points  is  (^■^),  77,658,710  for  n  =  15  so  straightforward  calculation  of 
a  is  usually  impractical.  That  is  why  we  have  emphasised  Monte  Carlo  apprtnrimations  to  a. 
Themeau  (1983)  considers  the  question  of  methods  more  efficient  than  pure  Monte  Carlo,  but 
at  present  there  is  no  generally  better  method  available. 


However  there  is  another  ^preach  to  approorimating  (10.5):  we  can  replace  the  usually 
complicated  function  tf(j»)  by  an  approximation  linear  in  p,  and  then  use  the  well-known  formula 
for  the  multinomial  variance  of  a  linear  function.  The  jackknife  approximation  ij{p)  is  the 
linear  function  of  p  which  matches  $[p),  (10.4),  at  the  n  points  corresponding  to  the  deletion 
of  a  single  as,-  from  the  observed  data  set  *i,  *2,  •  •  ■ ,  z^, 


=  -,1) 


(10.6) 


»  =  1, 2,  •  •  • ,  n.  Figure  7A  indicates  the  jackknife  points  for  n  =  3;  because  #  is  the  functional 
form,  (10.4),  it  doesn’t  matter  that  the  jackknife  points  correspond  to  sample  size  r»- 1  rather 
than  n. 

The  linear  function  9j{p)  u  calculated  to  be 


h{p)  =  ^(i)  +  {?-p*)U  (10.7) 

where  in  terms  of  9^q  =  ^(p(,')),  #(.)  =  and  17  is  the  rector  with  ith  coordinate 


t7.  =  (n-l)(8(.,-|{.^). 

The  jackknife  estimate  of  standard  error,  Tukey  (1958),  Miller  (1974),  is 


(10.8) 


(10.9) 
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A  standard  multinomia]  calculation  gires  the  following  theorem  (El&on  1982), 


Theorem.  The  jackknife  estimate  of  standard  error  equals  [n/(n  -  1)]*^^  times  the 
bootstrap  estimate  of  standard  error  for  ij. 


oj 


n 

n  —  1 


Var. 


(10.10) 


In  other  words,  the  jackknife  estimate  is  itself  almost  a  bootstrap  estimate  applied  to  a  linear 
approximation  of  S.  The  factor  [n/(n  —  1)]^^  in  (10.10)  makes  unbiased  for  in  the  case 
where  0  =  It,  the  sample  mean.  We  could  mriltiplj  the  bootstrap  estimate  &  fay  this  same  factor, 
and  achieTe  the  same  unbiasedness,  but  there  doesn’t  seem  to  be  any  consistent  adrantage  to 
doing  so.  The  jackknife  requires  n,  rather  than  B  =  50  to  200  resamples,  at  the  expense  of 
adding  a  linear  approximation  to  the  standard  error  estimate.  Thbles  1  and  2  indicate  that 
there  is  some  estimating  efficiency  lost  in  making  this  approximation.  For  statistic  like  the 
sample  median  which  are  difficult  to  approximate  lineiuly,  the  jackknife  is  useless,  see  Section 
3.4  of  Efron  (1982). 


There  is  a  more  ofarious  linear  approximation  to  9{p)  than  Why  not  use  the  first- 

order  Taylor  series  expansion  for  9{p)  about  the  point  p  =  p*?  This  is  the  idea  of  Jaeckel’s 
infindteiimal  jackknife  (1972).  The  Taylor  series  approximation  turns  out  to  be 


where 


Uf  =  lim 


(10.11) 


Si  being  the  tth  coordinate  vector.  This  suggests  the  infinitesimal  jackknife  estimate  of  standard 


error 


an  =  (  Var. 


(10.12) 


with  Var.  still  indicating  variance  under  (10.2).  The  ordinary  jackknife  can  be  thought  of 
as  taking  e  =  —  l/(n  —  1)  in  the  definition  of  U*,  while  the  infinitesimal  jackknife  lets  e  — ►  0, 
thereby  earning  the  name. 


The  Uf  are  values  of  what  Mallows  (1974)  calls  the  empirical  influence  function.  Their 
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definition  is  a  nonparametric  estimate  of  the  true  influence  function 

/f(.) .  to  +.<.)-<(/•) 

S,  being  the  degenerate  distribution  putting  mau  1  on  *,  The  right  tide  of  (10.12)  is  then  the 
obvious  estimate  of  the  influence  function  ^prozimation  to  the  standard  error  of  §,  (Ham¬ 
pel  1974),  <r(I')  =  1/  IF*(x)dI'(x)/nj^/^.  The  empirical  influence  function  method  and  the 
infinitesimal  jackknife  give  identical  estimates  of  standard  error. 

How  have  statisticians  gotten  along  for  so  many  years  without  methods  like  the  Jack¬ 
knife  and  bootstrap?  The  answer  is  the  delta  method,  which  is  stiU  the  most  commonly 
used  device  for  approximating  standard  errors.  The  method  ^plies  to  statistics  of  the  form 
f(^i)^2)''‘»^^)j  where  t(’>  ■»  *  *  ■  j  *)  is  a  known  function  and  each  is  an  observed  average, 
~  For  example  the  correlation  #  is  a  function  of  it  =  5  such  averages;  the 

average  of  the  first  coordinate  values,  the  second  coordinates,  the  first  coordinates  squarad, 
the  second  coordinates  squares,  and  the  cross-products. 

In  its  nonparametric  formulation,  the  delta  method  works  by  (a)  t  in  a  Im- 

ear  Taylor  series  usmg  the  usual  expressions  for  variances  and  covariances  of  averages;  and 
(b)  substituting  7(F)  for  any  unknown  quantity  7(F)  occurring  in  (c).  For  example,  the 
nonparametric  delta  method  estimates  the  standard  error  <rf  the  correlation  $  by 

/  [^40  M40  ,  4A«  4Asi  4Ais  1 1 

1  Ul)  ihoiim  M?i  AiiAoa  “  AnAojJ  j 

where,  m  terms  of  x<  =  (y,-,x,).  A,*  s  E(yi  -  j)»(*i  -  *)*/«  (Cramer  1946,  p.  359). 

Theorem.  For  statistics  of  the  form  #  =  f(<Ji,  •  •  • ,  (J^),  the  nonparametric  delta  method 
and  the  infinitesimal  jackknife  give  the  same  estimate  of  standard  error  (Efron  1981b). 

The  infinitesimal  jackknife,  the  delta  method,  and  the  empirical  inflnenee  function  ap¬ 
proach  are  three  names  for  the  same  method.  Notice  that  the  results  reported  in  line  7  of 
Table  2  show  a  severe  downward  bias.  Efron  and  Stein  (1981)  show  that  the  ordinary  jackknife 
IS  always  biased  upwards,  in  a  sense  made  precise  m  that  paper.  In  the  authors’  opinion  the 

ordinary  jackknife  is  the  method  of  choice  if  one  does  not  want  to  do  the  bootstnq)  computa- 

tions. 
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Appendix 


BOOTSTRAP  PROGRAM 

The  following  FORTRAN  prognm  bootstraps  the  statistic  defined  by  the  nser-specified 
function  THETA.  Comments  in  italics  are  not  part  of  the  FORTRAN  code.  Note  that  the 
random  number  subroutines  IRAND  and  RAND  will  be  installation  dependent. 


REAL  TClOO}.T8TAR(iOO).THSTAR(iOOO} 

EXTERNAL  THETA 
N*100  sample  size 
HBOOT*iOOO  number  of  bootstraps 
DO  10  I-1,N 

REA0(5.O  TCI)  read  in  data 
10  CONTINUE 

TEMP-THETA(N.T) 

WRITECO,  100)  TEXP  write  out  value  of  theta  for  original  sample 
100  FORUATC*  THETA-  *.  IIS.S) 

R£AD(5.*)  I8EED  read  in  seed  for  random  number  generator 
CALL  IRAND  (ISEED)  initialize  random  number  generator 
DO  20  I-l.MBOOT 
DO  30  J-l.N 

U*RAND()  get  a  random  number  between  0  and  1 
II-INT(U*N)  *  1  convert  it  to  a  random  integer  between  1  and  N 
T8TAR(J)-TCII)  assign  the  jth  element  of  bootstrap  sample 
30  CONTINUE 

THSTARCD-THETACN.TSTAR)  compute  bootstrap  value 
20  CONTINUE 

THBAR-0 

DO  40  I-l.NBOOT 

TEBAR-THBAR-»TH8TARCI)/NB00T  compute  bootstrap  mean 
40  CONTINUE 

THVAR-0 

DO  BO  I-l.NBOOT 

THVAR-THVAR+(THSTAR(I)-THBAR)**2  compute  bootstrap  variance 
50  CONTINUE 

SDB00T*SQRT(THVAR/(NB00T-1))  compute  bootstrap  estimate  of  standard  error 
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WRZTE(6.102)  8DBG0T 

102  FQRMATC'  BOOTSTRAP  ESTIMATE  OF  STANDARD  ERROR-  *.  113. 6) 
WRITECe.O 

WRZTECe.lOS) 

103  FORMATC  BOOTSTRAP  VALUES  OF  THETA:  *) 

DO  eO  Z-l.NBOOT 

VRITE(6,*)  TBSTAR(I)  write  owt  booUtmp  values  for  further  analysis 
60  CONTINUE 

STOP 
END 


REAL  FUNCTION  TEETA(N,T) 

REAL  TCN) 

compute  statistic  of  interest  for  the  sample  y(l),  y(2)...y(n) 

RETURN 

END 
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